PyTorch分布式(9) ----- DistributedDataParallel初始化

2024-02-09 06:16:31

如何初始化 PyTorch 中的 DistributedDataParallel

在 PyTorch 中进行分布式训练时，DistributedDataParallel（DDP）是一个不可或缺的工具。它允许我们在多台 GPU 上训练模型，从而显著提高训练速度和吞吐量。本文将深入探讨 DDP 的初始化过程，帮助你理解其内部机制。

一、DDP 初始化概览

1. Python 世界初始化

调用 torch.distributed.init_process_group() 初始化分布式进程组。
创建 DistributedDataParallel 对象。
将模型和优化器包装进 DistributedDataParallel 对象。
调用 DistributedDataParallel 对象的 train() 方法开始分布式训练。

2. C++ 世界初始化

创建 DistributedDataParallel 对象的 C++ 对象。
将模型和优化器包装进 DistributedDataParallel 对象的 C++ 对象。
调用 DistributedDataParallel 对象的 C++ 对象的 train() 方法开始分布式训练。

二、DDP 初始化的关键代码解析

1. Python 世界关键代码

def __init__(self, module, device_ids=None, output_device=None, dim=0, bucket_cap_mb=25):
    """
    See :meth:`torch.nn.parallel.DistributedDataParallel` for most of the
    arguments' documentation.

    Args:
        bucket_cap_mb (float, optional): DistributedDataParallel will
            bucket parameters into multiple buckets so that each bucket would
            be no larger than `bucket_cap_mb` MegaBytes (default: 25)
    """
    super(DistributedDataParallel, self).__init__(module, device_ids, output_device, dim)

    self.bucket_cap_mb = bucket_cap_mb
    self._ddp_init_helper()

此代码片段是 DistributedDataParallel 对象的构造函数，它初始化 DDP 相关属性，如 bucket_cap_mb，并调用 _ddp_init_helper() 方法进行其他初始化工作。

def _ddp_init_helper(self):
    # ...省略其他代码...

    # Get a list of all parameters, including the parameters of nested modules.
    param_list = get_parameter_list(self.module)

    # Determine the bucket size and the number of buckets.
    self.bucket_sizes, self.num_buckets = _determine_bucket_sizes_and_num_buckets(
        param_list, self.bucket_cap_mb
    )

    # Create a list of parameter buckets.
    self.buckets = [[] for _ in range(self.num_buckets)]

    # Assign parameters to buckets.
    _assign_parameters_to_buckets(param_list, self.buckets, self.bucket_sizes)

    # Create a list of DDP groups, one for each bucket.
    self.ddp_groups = [
        dist.new_group([rank for rank in self.process_group]) for _ in range(self.num_buckets)
    ]

    # ...省略其他代码...

此代码片段是 _ddp_init_helper() 方法，它获取模型的所有参数，确定桶的大小和数量，并创建桶列表。然后，它将参数分配到桶中，并为每个桶创建一个 DDP 组。

2. C++ 世界关键代码

template <typename param_type>
static void _assign_parameters_to_buckets(
    std::vector<param_type>& parameters,
    std::vector<std::vector<param_type>>& buckets,
    const std::vector<size_t>& bucket_sizes) {
  // ...省略其他代码...

  for (size_t param_idx = 0; param_idx < parameters.size(); ++param_idx) {
    // Get the size of the current parameter.
    size_t param_size = _get_parameter_size(parameters[param_idx]);

    // Find the smallest bucket that can hold the current parameter.
    size_t bucket_idx = 0;
    while (bucket_idx < buckets.size() && bucket_sizes[bucket_idx] < param_size) {
      ++bucket_idx;
    }

    // Add the current parameter to the selected bucket.
    buckets[bucket_idx].push_back(parameters[param_idx]);

    // Update the size of the selected bucket.
    bucket_sizes[bucket_idx] += param_size;

    // ...省略其他代码...
  }
}