hal*_*ole 5 c mpi matrix-multiplication
我有以下代码:
//Start MPI...
MPI_Init(&argc, &argv);
int size = atoi(argv[1]);
int delta = 10;
int rnk;
int p;
int root = 0;
MPI_Status mystatus;
MPI_Comm_rank(MPI_COMM_WORLD, &rnk);
MPI_Comm_size(MPI_COMM_WORLD, &p);
//Checking compatibility of size and number of processors
assert(size % p == 0);
//Initialize vector...
double *vector = NULL;
vector = malloc(size*sizeof(double));
double *matrix = NULL;
//Rank 0 -----------------------------------
if (rnk == 0) {
//Initialize vector...
srand(1);
for (int i = 0; i < size; i++) {
vector[i] = rand() % delta + 1;
}
printf("Initial vector:");
print_vector(vector, size);
//Initialize matrix...
matrix = malloc(size*size*sizeof(double));
srand(2);
for (int i = 0; i < (size*size); i++) {
matrix[i] = rand() % delta + 1;
}
//Print matrix...
printf("Initial matrix:");
print_flat_matrix(matrix, size);
}
//Calculating chunk_size...
int chunk_size = size/p;
//Initialize submatrix..
double *submatrix = malloc(size*chunk_size*sizeof(double));
//Initialize result vector...
double *result = malloc(chunk_size*sizeof(double));
//Broadcasting vector...
MPI_Bcast(vector, size, MPI_DOUBLE, root, MPI_COMM_WORLD);
MPI_Barrier(MPI_COMM_WORLD);
//Scattering matrix...
MPI_Scatter(matrix, (size*chunk_size), MPI_DOUBLE, submatrix, (size*chunk_size), MPI_DOUBLE, root, MPI_COMM_WORLD);
MPI_Barrier(MPI_COMM_WORLD);
printf("I am rank %d and first element of my vector is: %f and of my matrix1: %f/matrix2: %f/matrix3: %f/matrix4: %f\n", rnk, vector[0], submatrix[0], submatrix[1], submatrix[2], submatrix[3]);
//Calculating...
for (int i = 0; i < chunk_size; i++) {
for (int j = 0; j < size; j++) {
result[i] += (submatrix[(i*size)+j] * vector[j]);
printf("Rank %d; current result: %f, ", rnk, result[i]);
}
printf("\n");
printf("Rank %d; result: %f...\n", rnk, result[i]);
}
printf("Rank: %d; first result: %f\n", rnk, result[0]);
double *final_result = NULL;
//Rank 0 -----------------------------------
if (rnk == 0) {
final_result = malloc(size*sizeof(double));
}
//Gather...
MPI_Gather(result, chunk_size, MPI_DOUBLE, final_result, chunk_size, MPI_DOUBLE, root, MPI_COMM_WORLD);
//Rank 0 -----------------------------------
if (rnk == 0) {
printf("Final result:\n");
print_vector(final_result, size);
free(matrix);
free(final_result);
}
free(submatrix);
free(result);
free(vector);
MPI_Finalize();
Run Code Online (Sandbox Code Playgroud)
当我运行程序时,它可以正常运行,没有错误,但是我最后打印的值并不总是正确的。有时我会收到带有正确输出的向量,有时它是部分正确的,有时是完全错误的。错误的值要么恰好是2的值,要么是一些很长的无用的数字序列(在我看来这必须是错误的内存访问,但是我找不到任何东西,而且很奇怪,因为有时作品)。
我也总是选择我的大小,以便适合mpi创建的进程数。mpi在我的机器上创建4个进程(已测试和已检查的值),因此为了测试我的算法,我始终选择4作为size的值。较大的尺寸也会出现相同的问题。
期待您的帮助和投入,谢谢!
PS:我在C
您熟悉valgrind吗?它将立即引起您对问题线的注意。
您的麻烦似乎是此行:
result[i] += (submatrix[(i*size)+j] * vector[j]);
Run Code Online (Sandbox Code Playgroud)
最初的结果是什么?它被拉离了堆。有时,如果幸运的话,它将为零。不要指望C带来好运。
有很多初始化数组的方法。以下是一些方法,按最有可能被优化的顺序列出:
用calloc分配result []:
double *result = calloc(chunk_size , sizeof(double));
Run Code Online (Sandbox Code Playgroud)
或者,使用memset初始化数组:
double *result = malloc(chunk_size *sizeof(double));
memset(result, 0, chunk_size *sizeof(double));
Run Code Online (Sandbox Code Playgroud)
或者,可以遍历数组
for (i=0; i < chunk_size; i++)
result[i] = 0.0
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
454 次 |
| 最近记录: |