教堂中的正交递归对分（Barnes-Hut算法）

Question

教堂中的正交递归对分（Barnes-Hut算法）

Rok*_*sel 5 parallel-processing chapel

我正在Chapel中实现Barnes-Hut n体仿真的分布式版本。我已经实现了GitHub上可用的顺序和共享内存版本。

我正在遵循此处概述的算法（第7章）：

执行正交递归二等分并分配主体，以使每个过程的工作量相等
在每个过程上构造本地必不可少的树
计算力量和推进机构

我对如何在C / MPI中实现MPI_Allreduce用于二等分和简单消息传递以在进程之间进行通信（用于主体传输）有一个很好的想法。这也是MPI_Comm_split一个非常方便的功能，可让我在ORB的每个步骤中拆分进程。

我在使用Chapel提供的并行/分布式结构执行ORB时遇到了一些麻烦。我需要某种方式来汇总（减少）跨流程（在Chapel中的语言环境）的工作，将流程分为多个组，并进行流程间的通信以转移主体。

对于在教堂中如何实施此建议，我将不胜感激。如果另一种方法对Chapel更好，那也很好。

Answer 1

Rok*_*sel 2

After a lot of deadlocks and crashes I did manage to implement the algorithm in Chapel. It can be found here: https://github.com/novoselrok/parallel-algorithms/tree/75312981c4514b964d5efc59a45e5eb1b8bc41a6/nbody-bh/dm-chapel

I was not able to use much of the fancy parallel equipment Chapel provides. I relied only on block distributed arrays with sync elements. I also replicated the SPMD model.

In main.chpl I set up all of the necessary arrays that will be used to transfer data. Each array has a corresponding sync array used for synchronization. Then each worker is started with its share of bodies and the previously mentioned arrays. Worker.chpl contains the bulk of the algorithm.

I replaced the MPI_Comm_split functionality with a custom function determineGroupPartners where I do the same thing manually. As for the MPI_Allreduce I found a nice little pattern I could use everywhere:

var localeSpace = {0..#numLocales};
var localeDomain = localeSpace dmapped Block(boundingBox=localeSpace);

var arr: [localeDomain] SomeType;
var arr$: [localeDomain] sync int; // stores the ranks of inteded receivers
var rank = here.id;

for i in 0..#numLocales-1 {
    var intendedReceiver = (rank + i + 1) % numLocales;
    var partner = ((rank - (i + 1)) + numLocales) % numLocales;

    // Wait until the previous value is read
    if (arr$[rank].isFull) {
        arr$[rank];
    }

    // Store my value
    arr[rank] = valueIWantToSend;
    arr$[rank] = intendedReceiver;

    // Am I the intended receiver?
    while (arr$[partner].readFF() != rank) {}

    // Read partner value
    var partnerValue = arr[partner];

    // Do something with partner value

    arr$[partner]; // empty

    // Reset write, which blocks until arr$[rank] is empty
    arr$[rank] = -1;
}

Run Code Online (Sandbox Code Playgroud)

This is a somewhat complicated way of implementing FIFO channels (see Julia RemoteChannel, where I got the inspiration for this "pattern"). Overview:

First each locale calculates its intended receiver and its partner (where the locale will read a value from)
Locale checks if the previous value was read
Locales stores a new value and "locks" the value by setting the arr$[rank] with the intended receiver
Locale waits while its partner sets the value and sets the appropriate intended receiver
Once the locale is the intended receiver it reads the partner value and does some operation on it
Then locale empties/unlocks the value by reading arr$[partner]
Finally it resets its arr$[rank] by writing -1. This way we also ensure that the value set by locale was read by the intended receiver

I realize that this might be an overly complicated solution for this problem. There probably exists a better algorithm that would fit Chapels global view of parallel computation. The algorithm I implemented lends itself to the SPMD model of computation.

That being said, I think that Chapel still does a good job performance-wise. Here are the performance benchmarks against Julia and C/MPI. As the number of processes grows the performance improves by quite a lot. I didn't have a chance to run the benchmark on a cluster with >4 nodes, but I think Chapel will end up with respectable benchmarks.

归档时间：	6 年，7 月前
查看次数：	226 次
最近记录：	6 年，6 月前