使用MPI_Gather在Fortran中发送2D数组

Question

使用MPI_Gather在Fortran中发送2D数组

我想使用发送2d数据块MPI_GATHER.例如:我在每个节点上有2x3阵列,如果我有4个节点,我想在root上有8x3阵列.对于1d数组,MPI_GATHER根据MPI排序对数据进行排序,但对于2d数据,它会造成混乱!

把块放好的干净方法是什么？

我期望这段代码的输出:

program testmpi
  use mpi
  implicit none
  integer :: send (2,3)
  integer :: rec (4,3)
  integer :: ierror,my_rank,i,j

  call MPI_Init(ierror)
  MPI_DATA_TYPE type_col
  ! find out process rank
  call MPI_Comm_rank(MPI_COMM_WORLD, my_rank, ierror)
  if (my_rank==0) then
    send=1
    do i=1,2
      print*,(send(i,j),j=1,3)
    enddo
  endif
  if (my_rank==1) then
    send=5
    ! do 1,2
    !   print*,(send(i,j),j=1,3)
    ! enddo
  endif
  call MPI_GATHER(send,6,MPI_INTEGER,rec,6,MPI_INTEGER,0,MPI_COMM_WORLD,ierror)
  if (my_rank==0) then
    print*,'<><><><><>rec'
    do i=1,4
      print*,(rec(i,j),j=1,3)
    enddo
  endif
  call MPI_Finalize(ierror)
end program testmpi

Run Code Online (Sandbox Code Playgroud)

是这样的:

   1           1           1
   1           1           1
   5           5           5
   5           5           5

Run Code Online (Sandbox Code Playgroud)

但它看起来像这样:

   1           1           5
   1           1           5
   1           5           5
   1           5           5

Run Code Online (Sandbox Code Playgroud)

Answer 1

Jon*_*rsi 27

以下是这个答案的文字Fortran翻译.我原以为这是不必要的,但数组索引和内存布局的多重差异可能意味着值得做一个Fortran版本.

首先让我说你通常不想这样做 - 分散并从一些"主"过程中收集大量数据.通常,您希望每个任务都能完成自己的工作,并且您的目标是永远不要让一个处理器需要整个数据的"全局视图"; 只要您需要,就可以限制可扩展性和问题大小.如果您正在为I/O执行此操作 - 一个进程读取数据,然后将其分散,然后将其收集回来进行写入,您最终将需要查看MPI-IO.

但是,为了解决您的问题,MPI有很好的方法可以将任意数据从内存中拉出来,并将其分散/收集到一组处理器中.不幸的是,这需要相当数量的MPI概念--MPI类型,范围和集体操作.在这个问题的答案中讨论了很多基本思想 - MPI_Type_create_subarray和MPI_Gather.

考虑一个1d整数全局数组,任务0有你想要分配给许多MPI任务,这样它们每个都得到一个本地数组.假设你有4个任务,全局数组是[0,1,2,3,4,5,6,7].您可以让任务0发送四条消息(包括一条消息)来分发它,当它需要重新组装时,接收四条消息将它们捆绑在一起; 但这显然在大量流程中非常耗时.有针对这些操作的优化例程 - 分散/收集操作.所以在这个1d案例中你会做这样的事情:

integer, dimension(8) :: global      ! only root has this
integer, dimension(2) :: local       ! everyone has this
integer, parameter    :: root = 0
integer :: rank, comsize
integer :: i, ierr

call MPI_Init(ierr)
call MPI_Comm_size(MPI_COMM_WORLD, comsize, ierr)
call MPI_Comm_rank(MPI_COMM_WORLD, rank, ierr)

if (rank == root) then
    global = [ (i, i=1,8) ]
endif

call MPI_Scatter(global, 2, MPI_INTEGER, &    ! send everyone 2 ints from global
                 local,  2, MPI_INTEGER, &    ! each proc recieves 2 into
                 root,                   &    ! sending process is root,
                 MPI_COMM_WORLD, ierr)        ! all procs in COMM_WORLD participate

Run Code Online (Sandbox Code Playgroud)

在此之后,处理器的数据看起来像

task 0:  local:[1,2]  global: [1,2,3,4,5,6,7,8]
task 1:  local:[3,4]  global: [garbage]
task 2:  local:[5,6]  global: [garbage]
task 3:  local:[7,8]  global: [garbage]

Run Code Online (Sandbox Code Playgroud)

也就是说,分散操作采用全局数组并将连续的2-int块发送到所有处理器.

要重新组装数组,我们使用MPI_Gather()操作,它的工作方式完全相同但反过来:

local = local + rank

call MPI_Gather (local,  2, MPI_INTEGER, &    ! everyone sends 2 ints from local
                 global, 2, MPI_INTEGER, &    ! root receives 2 ints each proc into global
                 root,                   &    ! receiving process is root,
                 MPI_COMM_WORLD, ierr)        ! all procs in COMM_WORLD participate

Run Code Online (Sandbox Code Playgroud)

现在阵列看起来像:

task 0:  local:[1,2]    global: [1,2,4,5,7,8,10,11]
task 1:  local:[4,5]    global: [garbage-]
task 2:  local:[7,8]    global: [garbage-]
task 3:  local:[10,11]  global: [garbage-]

Run Code Online (Sandbox Code Playgroud)

Gather带回了所有数据.

如果数据点的数量不能均匀地划分进程数,会发生什么？我们需要向每个进程发送不同数量的项目？然后,您需要一个通用版本的scatter,MPI_Scatterv它允许您指定每个处理器的计数和位移 - 位于全局数组中的那条数据的起始位置.因此,假设您使用相同的4个任务,您有一个包含9个字符的字符数组[a,b,c,d,e,f,g,h,i],并且您将为每个进程分配两个字符,除了最后一个那有三个.然后你需要

character, dimension(9) :: global
character, dimension(3) :: local
integer, dimension(4)   :: counts
integer, dimension(4)   :: displs

if (rank == root) then
    global = [ (achar(i+ichar('a')), i=0,8) ]
endif
local = ['-','-','-']

counts = [2,2,2,3]
displs = [0,2,4,6]

mycounts = counts(rank+1)

call MPI_Scatterv(global, counts, displs,         & ! proc i gets counts(i) chars from displs(i)
                  MPI_CHARACTER,                  &
                  local, mycounts, MPI_CHARACTER, & ! I get mycounts chars into
                  root,                           & ! root rank does sending
                  MPI_COMM_WORLD, ierr)             ! all procs in COMM_WORLD participate

Run Code Online (Sandbox Code Playgroud)

现在数据看起来像

task 0:  local:"ab-"  global: "abcdefghi"
task 1:  local:"cd-"  global: *garbage*
task 2:  local:"ef-"  global: *garbage*
task 3:  local:"ghi"  global: *garbage*

Run Code Online (Sandbox Code Playgroud)

您现在使用scatterv来分发不规则数据量.在每种情况下,位移是两个*级别(以字符为单位;位移是以散布形式发送的类型为单位的单位;或者是从数组的起点开始,它通常不以字节为单位),并且计数是[2,2,2,3].如果它是我们想要的3个字符的第一个处理器,我们将设置计数= [3,2,2,2]并且位移将是[0,3,5,7].Gatherv再次完全相同但反过来; count和displs数组将保持不变.

现在,对于2D,这有点棘手.如果我们想发送2d数组的2d子锁,我们现在发送的数据不再是连续的.如果我们向4个处理器发送(比方说)6x6阵列的3x3子块,我们发送的数据中有漏洞:

2D Array

   ---------
   |000|222|
   |000|222|
   |000|222|
   |---+---|
   |111|333|
   |111|333|
   |111|333|
   ---------

Actual layout in memory

   [000111000111000111222333222333222333]

Run Code Online (Sandbox Code Playgroud)

(请注意,所有高性能计算都归结为了解内存中数据的布局.)

如果我们想要将标记为"1"的数据发送到任务1,我们需要跳过三个值,发送三个值,跳过三个值,发送三个值,跳过三个值,发送三个值.第二个复杂因素是次区域停止和开始; 请注意,区域"1"不会在区域"0"停止的地方开始; 在区域"0"的最后一个元素之后,存储器中的下一个位置是通过区域"1"的中途.

让我们首先解决第一个布局问题 - 如何只提取我们想要发送的数据.我们总是可以将所有"0"区域数据复制到另一个连续的数组,然后发送; 如果我们仔细计划出来,我们甚至可以这样做,以便我们可以在结果上调用MPI_Scatter.但我们宁愿不必以这种方式转换整个主数据结构.

到目前为止,我们使用的所有MPI数据类型都是简单的 - MPI_INTEGER指定(比方说)连续4个字节.但是,MPI允许您创建自己的数据类型,以描述内存中任意复杂的数据布局.而这种情况 - 数组的矩形子区域 - 很常见,因此有一个特定的调用.对于我们上面描述的二维情况,

integer :: newtype;
integer, dimension(2) :: sizes, subsizes, starts

sizes    = [6,6]     ! size of global array
subsizes = [3,3]     ! size of sub-region 
starts   = [0,0]     ! let's say we're looking at region "0"
                     ! which begins at offset [0,0] 

call MPI_Type_create_subarray(2, sizes, subsizes, starts, MPI_ORDER_FORTRAN, MPI_INTEGER, newtype, ierr)
call MPI_Type_commit(newtype, ierr)

Run Code Online (Sandbox Code Playgroud)

这将创建一个从全局数组中仅选取区域"0"的类型.请注意,即使在Fortran中,start参数也是从数组的开头给出的偏移量(例如,从0开始),而不是索引(例如,从1开始).

我们现在可以将这段数据发送到另一个处理器

call MPI_Send(global, 1, newtype, dest, tag, MPI_COMM_WORLD, ierr)  ! send region "0"

Run Code Online (Sandbox Code Playgroud)

并且接收进程可以将其接收到本地阵列中.请注意,如果接收过程仅将其接收到3x3阵列中,则无法描述它作为一种新类型接收的内容; 不再描述内存布局,因为在一行的结尾和下一行的开头之间没有大的跳过.相反,它只是接收一个3*3 = 9个整数的块:

call MPI_Recv(local, 3*3, MPI_INTEGER, 0, tag, MPI_COMM_WORLD, ierr)

Run Code Online (Sandbox Code Playgroud)

Note that we could do this for other sub-regions, too, either by creating a different type (with different start array) for the other blocks, or just by sending starting from the first location of the particular block:

if (rank == root) then
    call MPI_Send(global(4,1), 1, newtype, 1, tag, MPI_COMM_WORLD, ierr)
    call MPI_Send(global(1,4), 1, newtype, 2, tag, MPI_COMM_WORLD, ierr)
    call MPI_Send(global(4,4), 1, newtype, 3, tag, MPI_COMM_WORLD, ierr)
    local = global(1:3, 1:3)
else
    call MPI_Recv(local, 3*3, MPI_INTEGER, 0, tag, MPI_COMM_WORLD, rstatus, ierr)
endif

Run Code Online (Sandbox Code Playgroud)

Now that we understand how to specify subregions, there's only one more thing to discuss before using scatter/gather operations, and that's the "size" of these types. We couldn't just use MPI_Scatter() (or even scatterv) with these types yet, because these types have an extent of 15 integers; that is, where they end is 15 integers after they start -- and where they end doesn't line up nicely with where the next block begins, so we can't just use scatter - it would pick the wrong place to start sending data to the next processor.

Of course, we could use MPI_Scatterv() and specify the displacements ourselves, and that's what we'll do - except the displacements are in units of the send-type size, and that doesn't help us either; the blocks start at offsets of (0,3,18,21) integers from the start of the global array, and the fact that a block ends 15 integers from where it starts doesn't let us express those displacements in integer multiples at all.

To deal with this, MPI lets you set the extent of the type for the purposes of these calculations. It doesn't truncate the type; it's just used for figuring out where the next element starts given the last element. For types like these with holes in them, it's frequently handy to set the extent to be something smaller than the distance in memory to the actual end of the type.

We can set the extent to be anything that's convenient to us. We could just make the extent 1 integer, and then set the displacements in units of integers. In this case, though, I like to set the extent to be 3 integers - the size of a sub-column - that way, block "1" starts immediately after block "0", and block "3" starts immediately after block "2". Unfortunately, it doesn't quite work as nicely when jumping from block "2" to block "3", but that can't be helped.

So to scatter the subblocks in this case, we'd do the following:

integer(kind=MPI_ADDRESS_KIND) :: extent

starts   = [0,0]
sizes    = [6, 6]
subsizes = [3, 3]

call MPI_Type_create_subarray(2, sizes, subsizes, starts,        &
                              MPI_ORDER_FORTRAN, MPI_INTEGER,  &
                              newtype, ierr)
call MPI_Type_size(MPI_INTEGER, intsize, ierr)
extent = 3*intsize
call MPI_Type_create_resized(newtype, 0, extent, resizedtype, ierr)
call MPI_Type_commit(resizedtype, ierr)

Run Code Online (Sandbox Code Playgroud)

Here we've created the same block type as before, but we've resized it; we haven't changed where the type "starts" (the 0) but we've changed where it "ends" (3 integers). We didn't mention this before, but the MPI_Type_commit is required to be able to use the type; but you only need to commit the final type you actually use, not any intermediate steps. You use MPI_Type_free to free the committed type when you're done.

So now, finally, we can scatterv the blocks: the data manipulations above are a little complicated, but once it's done, the scatterv looks just like before:

counts = 1          ! we will send one of these new types to everyone
displs = [0,1,6,7]  ! the starting point of everyone's data
                    ! in the global array, in block extents

call MPI_Scatterv(global, counts, displs, & ! proc i gets counts(i) types from displs(i) 
        resizedtype,                      &
        local, 3*3, MPI_INTEGER,          & ! I'm receiving 3*3 int
        root, MPI_COMM_WORLD, ierr)         !... from (root, MPI_COMM_WORLD)

Run Code Online (Sandbox Code Playgroud)

And now we're done, after a little tour of scatter, gather, and MPI derived types.

An example code which shows both the gather and the scatter operation, with character arrays, follows. Running the program:

$ mpirun -np 4 ./scatter2d
 global array is:
 000222
 000222
 000222
 111333
 111333
 111333
 Rank            0  received:
 000
 000
 000
 Rank            1  received:
 111
 111
 111
 Rank            2  received:
 222
 222
 222
 Rank            3  received:
 333
 333
 333
 Rank            0  sending:
 111
 111
 111
 Rank            1  sending:
 222
 222
 222
 Rank            2  sending:
 333
 333
 333
 Rank            3  sending:
 444
 444
 444
  Root received:
 111333
 111333
 111333
 222444
 222444
 222444

Run Code Online (Sandbox Code Playgroud)

and the code follows:

program scatter
    use mpi
    implicit none

    integer, parameter :: gridsize = 6    ! size of array
    integer, parameter :: procgridsize = 2 ! size of process grid
    character, allocatable, dimension (:,:) :: global, local
    integer, dimension(procgridsize**2)   :: counts, displs
    integer, parameter    :: root = 0
    integer :: rank, comsize
    integer :: localsize
    integer :: i, j, row, col, ierr, p, charsize
    integer, dimension(2) :: sizes, subsizes, starts

    integer :: newtype, resizedtype
    integer, parameter :: tag = 1
    integer, dimension(MPI_STATUS_SIZE) :: rstatus
    integer(kind=MPI_ADDRESS_KIND) :: extent, begin

    call MPI_Init(ierr)
    call MPI_Comm_size(MPI_COMM_WORLD, comsize, ierr)
    call MPI_Comm_rank(MPI_COMM_WORLD, rank, ierr)

    if (comsize /= procgridsize**2) then
        if (rank == root) then
            print *, 'Only works with np = ', procgridsize**2, ' for now.'
        endif
        call MPI_Finalize(ierr)
        stop
    endif

    localsize = gridsize/procgridsize
    allocate( local(localsize, localsize) )
    if (rank == root) then
        allocate( global(gridsize, gridsize) )
        forall( col=1:procgridsize, row=1:procgridsize )
            global((row-1)*localsize+1:row*localsize, &
                   (col-1)*localsize+1:col*localsize) = &
                    achar(ichar('0')+(row-1)+(col-1)*procgridsize)
        end forall

        print *, 'global array is: '
        do i=1,gridsize
            print *, global(i,:)
        enddo
    endif
    starts   = [0,0]
    sizes    = [gridsize, gridsize]
    subsizes = [localsize, localsize]

    call MPI_Type_create_subarray(2, sizes, subsizes, starts,        &
                                  MPI_ORDER_FORTRAN, MPI_CHARACTER,  &
                                  newtype, ierr)
    call MPI_Type_size(MPI_CHARACTER, charsize, ierr)
    extent = localsize*charsize
    begin  = 0
    call MPI_Type_create_resized(newtype, begin, extent, resizedtype, ierr)
    call MPI_Type_commit(resizedtype, ierr)

    counts = 1          ! we will send one of these new types to everyone
    forall( col=1:procgridsize, row=1:procgridsize )
       displs(1+(row-1)+procgridsize*(col-1)) = (row-1) + localsize*procgridsize*(col-1)
    endforall

    call MPI_Scatterv(global, counts, displs,   & ! proc i gets counts(i) types from displs(i)
            resizedtype,                        &
            local, localsize**2, MPI_CHARACTER, & ! I'm receiving localsize**2 chars
            root, MPI_COMM_WORLD, ierr)           !... from (root, MPI_COMM_WORLD)

    do p=1, comsize
        if (rank == p-1) then
            print *, 'Rank ', rank, ' received: '
            do i=1, localsize
                print *, local(i,:)
            enddo
        endif
        call MPI_Barrier(MPI_COMM_WORLD, ierr)
    enddo

    local = achar( ichar(local) + 1 )

    do p=1, comsize
        if (rank == p-1) then
            print *, 'Rank ', rank, ' sending: '
            do i=1, localsize
                print *, local(i,:)
            enddo
        endif
        call MPI_Barrier(MPI_COMM_WORLD, ierr)
    enddo

    call MPI_Gatherv( local, localsize**2, MPI_CHARACTER, & ! I'm sending localsize**2 chars
                      global, counts, displs, resizedtype,&
                      root, MPI_COMM_WORLD, ierr)

    if (rank == root) then
        print *, ' Root received: '
        do i=1,gridsize
            print *, global(i,:)
        enddo
    endif

    call MPI_Type_free(newtype,ierr)
    if (rank == root) deallocate(global)
    deallocate(local)
    call MPI_Finalize(ierr)

end program scatter

Run Code Online (Sandbox Code Playgroud)

So that's the general solution. For your particular case, where we are just appending by rows, we don't need a Gatherv, we can just use a gather, because in this case, all of the displacements are the same -- before, in the 2d block case we had one displacement going 'down', and then jumps in that displacement as you went 'across' to the next column of blocks. Here, the displacement is always one extent from the previous one, so we don't need to give displacements explicitly. So a final code looks like:

program testmpi
use mpi
    implicit none
    integer, dimension(:,:), allocatable :: send, recv
    integer, parameter :: nsendrows = 2, nsendcols = 3
    integer, parameter :: root = 0
    integer :: ierror, my_rank, comsize, i, j, ierr
    integer :: blocktype, resizedtype
    integer, dimension(2) :: starts, sizes, subsizes
    integer (kind=MPI_Address_kind) :: start, extent
    integer :: intsize

    call MPI_Init(ierror)
    call MPI_Comm_rank(MPI_COMM_WORLD, my_rank, ierror)
    call MPI_Comm_size(MPI_COMM_WORLD, comsize, ierror)

    allocate( send(nsendrows, nsendcols) )

    send = my_rank

    if (my_rank==root) then
        ! we're going to append the local arrays
        ! as groups of send rows
        allocate( recv(nsendrows*comsize, nsendcols) )
    endif

    ! describe what these subblocks look like inside the full concatenated array
    sizes    = [ nsendrows*comsize, nsendcols ]
    subsizes = [ nsendrows, nsendcols ]
    starts   = [ 0, 0 ]

    call MPI_Type_create_subarray( 2, sizes, subsizes, starts,     &
                                   MPI_ORDER_FORTRAN, MPI_INTEGER, &
                                   blocktype, ierr)

    start = 0
    call MPI_Type_size(MPI_INTEGER, intsize, ierr)
    extent = intsize * nsendrows

    call MPI_Type_create_resized(blocktype, start, extent, resizedtype, ierr)
    call MPI_Type_commit(resizedtype, ierr)

    call MPI_Gather( send, nsendrows*nsendcols, MPI_INTEGER, &  ! everyone send 3*2 ints
                     recv, 1, resizedtype,                   &  ! root gets 1 resized type from everyone
                     root, MPI_COMM_WORLD, ierr)

    if (my_rank==0) then
    print*,'<><><><><>recv'
    do i=1,nsendrows*comsize
        print*,(recv(i,j),j=1,nsendcols)
    enddo
    endif
    call MPI_Finalize(ierror)

end program testmpi

Run Code Online (Sandbox Code Playgroud)

Running this with 3 processes gives:

$ mpirun -np 3 ./testmpi
 <><><><><>recv
           0           0           0
           0           0           0
           1           1           1
           1           1           1
           2           2           2
           2           2           2

Run Code Online (Sandbox Code Playgroud)

@JonathanDursi我认为这个答案应该复制在文档中,因为它绝对是优秀的. (2认同)

归档时间：	12 年，2 月前
查看次数：	10992 次
最近记录：	6 年，11 月前