MPI的Scatterv操作

Dan*_*Dan 0 c++ parallel-processing mpi openmpi

我不确定我是否正确理解MPI_Scatterv应该做什么.我有79个项目来分散可变数量的节点.但是,当我使用MPI_Scatterv命令时,我得到了荒谬的数字(好像我的接收缓冲区的数组元素未初始化).以下是相关的代码段:

MPI_Init(&argc, &argv);
int id, procs;

MPI_Comm_rank(MPI_COMM_WORLD, &id);
MPI_Comm_size(MPI_COMM_WORLD, &procs);

//Assign each file a number and figure out how many files should be
//assigned to each node
int file_numbers[files.size()];
int send_counts[nodes] = {0}; 
int displacements[nodes] = {0};

for (int i = 0; i < files.size(); i++)
{
    file_numbers[i] = i;
    send_counts[i%nodes]++;
}   

//figure out the displacements
int sum = 0;
for (int i = 0; i < nodes; i++)
{
    displacements[i] = sum;
    sum += send_counts[i];
}   

//Create a receiving buffer
int *rec_buf = new int[79];

if (id == 0)
{
    MPI_Scatterv(&file_numbers, send_counts, displacements, MPI_INT, rec_buf, 79, MPI_INT, 0, MPI_COMM_WORLD);
}   

cout << "got here " << id << " checkpoint 1" << endl;
cout << id << ": " << rec_buf[0] << endl;
cout << "got here " << id << " checkpoint 2" << endl;

MPI_Barrier(MPI_COMM_WORLD); 

free(rec_buf);

MPI_Finalize();
Run Code Online (Sandbox Code Playgroud)

当我运行该代码时,我收到此输出:

got here 1 checkpoint 1
1: -1168572184
got here 1 checkpoint 2
got here 2 checkpoint 1
2: 804847848
got here 2 checkpoint 2
got here 3 checkpoint 1
3: 1364787432
got here 3 checkpoint 2
got here 4 checkpoint 1
4: 903413992
got here 4 checkpoint 2
got here 0 checkpoint 1
0: 0
got here 0 checkpoint 2
Run Code Online (Sandbox Code Playgroud)

我阅读了OpenMPI的文档并查看了一些代码示例,我不确定我错过了什么帮助会很棒!

Hri*_*iev 5

MPI最常见的错误之一再次出现:

if (id == 0)    // <---- PROBLEM
{
    MPI_Scatterv(&file_numbers, send_counts, displacements, MPI_INT,
                 rec_buf, 79, MPI_INT, 0, MPI_COMM_WORLD);
}   
Run Code Online (Sandbox Code Playgroud)

MPI_SCATTERV是一个集体的MPI操作.集体操作必须由指定通信器中的所有进程执行才能成功完成.您只在0级执行它,这就是为什么只有它获得正确的值.

解决方案:删除条件if (...).

但这里还有另一个微妙的错误.由于集合操作不提供任何状态输出,因此MPI标准强制严格匹配发送到某个等级的元素的数量以及等级愿意接收的元素的数量.在您的情况下,接收器始终指定79可能与相应数字不匹配的元素send_counts.您应该使用:

MPI_Scatterv(file_numbers, send_counts, displacements, MPI_INT,
             rec_buf, send_counts[id], MPI_INT,
             0, MPI_COMM_WORLD);
Run Code Online (Sandbox Code Playgroud)

另请注意您的代码中存在以下差异,在此处发布问题时可能会出现拼写错误:

MPI_Comm_size(MPI_COMM_WORLD, &procs);
                               ^^^^^
int send_counts[nodes] = {0};
                ^^^^^
int displacements[nodes] = {0};
                  ^^^^^
Run Code Online (Sandbox Code Playgroud)

当您获得procs变量中的排名数时,将在nodes其余代码中使用.我猜nodes应该被替换为procs.