Ala*_*lds 3 c++ boost mpi boost-mpi
我使用Boost MPI相对较新.我已经安装了库,代码编译,但我得到一个非常奇怪的错误 - 从属节点接收的一些整数数据不是主节点发送的.到底是怎么回事?
我正在使用boost版本1.42.0,使用mpic ++编译代码(在一个集群上包含g ++,在另一个集群上包含icpc).下面是一个简化示例,包括输出.
码:
#include <iostream>
#include <boost/mpi.hpp>
using namespace std;
namespace mpi = boost::mpi;
class Solution
{
public:
Solution() :
solution_num(num_solutions++)
{
// Master node's constructor
}
Solution(int solutionNum) :
solution_num(solutionNum)
{
// Slave nodes' constructor.
}
int solutionNum() const
{
return solution_num;
}
private:
static int num_solutions;
int solution_num;
};
int Solution::num_solutions = 0;
int main(int argc, char* argv[])
{
// Initialization of MPI
mpi::environment env(argc, argv);
mpi::communicator world;
if (world.rank() == 0)
{
// Create solutions
int numSolutions = world.size() - 1; // One solution per slave
vector<Solution*> solutions(numSolutions);
for (int sol = 0; sol < numSolutions; ++sol)
{
solutions[sol] = new Solution;
}
// Send solutions
for (int sol = 0; sol < numSolutions; ++sol)
{
world.isend(sol + 1, 0, false); // Tells the slave to expect work
cout << "Sending solution no. " << solutions[sol]->solutionNum() << " to node " << sol + 1 << endl;
world.isend(sol + 1, 1, solutions[sol]->solutionNum());
}
// Retrieve values (solution numbers squared)
vector<double> values(numSolutions, 0);
for (int i = 0; i < numSolutions; ++i)
{
// Get values for each solution
double value = 0;
mpi::status status = world.recv(mpi::any_source, 2, value);
int source = status.source();
int sol = source - 1;
values[sol] = value;
}
for (int i = 1; i <= numSolutions; ++i)
{
world.isend(i, 0, true); // Tells the slave to finish
}
// Output the solutions numbers and their squares
for (int i = 0; i < numSolutions; ++i)
{
cout << solutions[i]->solutionNum() << ", " << values[i] << endl;
delete solutions[i];
}
}
else
{
// Slave nodes merely square the solution number
bool finished;
mpi::status status = world.recv(0, 0, finished);
while (!finished)
{
int solNum;
world.recv(0, 1, solNum);
cout << "Node " << world.rank() << " receiving solution no. " << solNum << endl;
Solution solution(solNum);
double value = static_cast<double>(solNum * solNum);
world.send(0, 2, value);
status = world.recv(0, 0, finished);
}
cout << "Node " << world.rank() << " finished." << endl;
}
return EXIT_SUCCESS;
}
Run Code Online (Sandbox Code Playgroud)
在21个节点(1个主节点,20个从节点)上运行此节点会产生:
Sending solution no. 0 to node 1
Sending solution no. 1 to node 2
Sending solution no. 2 to node 3
Sending solution no. 3 to node 4
Sending solution no. 4 to node 5
Sending solution no. 5 to node 6
Sending solution no. 6 to node 7
Sending solution no. 7 to node 8
Sending solution no. 8 to node 9
Sending solution no. 9 to node 10
Sending solution no. 10 to node 11
Sending solution no. 11 to node 12
Sending solution no. 12 to node 13
Sending solution no. 13 to node 14
Sending solution no. 14 to node 15
Sending solution no. 15 to node 16
Sending solution no. 16 to node 17
Sending solution no. 17 to node 18
Sending solution no. 18 to node 19
Sending solution no. 19 to node 20
Node 1 receiving solution no. 0
Node 2 receiving solution no. 1
Node 12 receiving solution no. 19
Node 3 receiving solution no. 19
Node 15 receiving solution no. 19
Node 13 receiving solution no. 19
Node 4 receiving solution no. 19
Node 9 receiving solution no. 19
Node 10 receiving solution no. 19
Node 14 receiving solution no. 19
Node 6 receiving solution no. 19
Node 5 receiving solution no. 19
Node 11 receiving solution no. 19
Node 8 receiving solution no. 19
Node 16 receiving solution no. 19
Node 19 receiving solution no. 19
Node 20 receiving solution no. 19
Node 1 finished.
Node 2 finished.
Node 7 receiving solution no. 19
0, 0
1, 1
2, 361
3, 361
4, 361
5, 361
6, 361
7, 361
8, 361
9, 361
10, 361
11, 361
12, 361
13, 361
14, 361
15, 361
16, 361
17, 361
18, 361
19, 361
Node 6 finished.
Node 3 finished.
Node 17 receiving solution no. 19
Node 17 finished.
Node 10 finished.
Node 12 finished.
Node 8 finished.
Node 4 finished.
Node 15 finished.
Node 18 receiving solution no. 19
Node 18 finished.
Node 11 finished.
Node 13 finished.
Node 20 finished.
Node 16 finished.
Node 9 finished.
Node 19 finished.
Node 7 finished.
Node 5 finished.
Node 14 finished.
Run Code Online (Sandbox Code Playgroud)
因此,当主设备发送0到节点1,1到节点2,2到节点3等时,大多数从节点(由于某种原因)接收到数字19.因此,不是产生从0到19的数字的平方,我们得到0平方,1平方和19平方18倍!
提前感谢任何能解释这一点的人.
艾伦
Ala*_*lds 11
好吧,我想我有答案,这需要了解潜在的C风格MPI调用.Boost的'isend'函数本质上是'MPI_Isend'的包装器,它不保护用户不需要知道'MPI_Isend'如何工作的一些细节.
"MPI_Isend"的一个参数是指向包含您要发送的信息的缓冲区的指针.但重要的是,在您知道已收到消息之前,不能重复使用此缓冲区.请考虑以下代码:
// Get solution numbers from the solutions and store in a vector
vector<int> solutionNums(numSolutions);
for (int sol = 0; sol < numSolutions; ++sol)
{
solutionNums[sol] = solutions[sol]->solutionNum();
}
// Send solution numbers
for (int sol = 0; sol < numSolutions; ++sol)
{
world.isend(sol + 1, 0, false); // Indicates that we have not finished, and to expect a solution representation
cout << "Sending solution no. " << solutionNums[sol] << " to node " << sol + 1 << endl;
world.isend(sol + 1, 1, solutionNums[sol]);
}
Run Code Online (Sandbox Code Playgroud)
这完美地工作,因为每个解决方案编号都在其自己的内存位置.现在考虑以下小调整:
// Create solutionNum array
vector<int> solutionNums(numSolutions);
for (int sol = 0; sol < numSolutions; ++sol)
{
solutionNums[sol] = solutions[sol]->solutionNum();
}
// Send solutions
for (int sol = 0; sol < numSolutions; ++sol)
{
int solNum = solutionNums[sol];
world.isend(sol + 1, 0, false); // Indicates that we have not finished, and to expect a solution representation
cout << "Sending solution no. " << solNum << " to node " << sol + 1 << endl;
world.isend(sol + 1, 1, solNum);
}
Run Code Online (Sandbox Code Playgroud)
现在底层的'MPI_Isend'调用提供了一个指向solNum的指针.不幸的是,每次在循环周围都会覆盖这一位内存,所以虽然它看起来像4号被发送到节点5,但是当发送实际发生时,该内存位置的新内容(例如19)相反.
现在考虑原始代码:
// Send solutions
for (int sol = 0; sol < numSolutions; ++sol)
{
world.isend(sol + 1, 0, false); // Tells the slave to expect work
cout << "Sending solution no. " << solutions[sol]->solutionNum() << " to node " << sol + 1 << endl;
world.isend(sol + 1, 1, solutions[sol]->solutionNum());
}
Run Code Online (Sandbox Code Playgroud)
在这里,我们传递一个临时的.同样,每次循环时,此临时内存的位置都会被覆盖.同样,错误的数据被发送到从节点.
碰巧的是,我已经能够重构我的"真实"代码,使用"发送"代替"isend".但是,如果我将来需要使用'isend',我会更加小心!
归档时间: |
|
查看次数: |
2022 次 |
最近记录: |