MPI分段故障(信号11)

Ruz*_*qat 2 c mpi segmentation-fault

我已经尝试了超过两天,以查看自己犯了什么错误,但找不到任何东西。我不断收到以下错误:

=终止您的应用程序之一

=退出码:139

=清理剩余的过程

=您可以忽略下面的清理消息

YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)

This typically refers to a problem with your application.

Please see the FAQ page for debugging suggestions

make: *** [run] Error 139
Run Code Online (Sandbox Code Playgroud)

所以问题显然在MPI_BCAST另一个功能中MPI_GATHER。您能帮我找出问题所在吗?当我编译代码时,键入以下内容:

/usr/bin/mpicc  -I/usr/include   -L/usr/lib  z.main.c  z.mainMR.c  z.mainWR.c  -o  1dcode -g  -lm
Run Code Online (Sandbox Code Playgroud)

运行:

usr/bin/mpirun -np 2 ./1dcode dat.txt o.out.txt
Run Code Online (Sandbox Code Playgroud)

例如,我的代码包含以下功能:

#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
#include <math.h>
#include <string.h>
#include "functions.h"
#include <mpi.h>
/*...................z.mainMR master function............. */
void MASTER(int argc, char *argv[], int nPROC, int nWRs, int mster)
{

/*... Define all the variables we going to use in z.mainMR function..*/
double tend, dtfactor, dtout, D, b, dx, dtexpl, dt, time;
int MM, M, maxsteps, nsteps;
FILE *datp, *outp;
/*.....Reading the data file "dat" then saving the data in o.out.....*/
datp = fopen(argv[1],"r"); // Open the file in read mode
outp = fopen(argv[argc-1],"w"); // Open output file in write mode
if(datp != NULL) // If data file is not empty continue
{
fscanf(datp,"%d %lf %lf %lf %lf %lf",&MM,&tend,&dtfactor,&dtout,&D,&b);    // read the data
fprintf(outp,"data>>>\nMM=%d\ntend=%lf\ndtfactor=%lf\ndtout=%lf\nD=%lf\nb=%lf\n",MM,tend,dtfactor,dtout,D,b);
fclose(datp); // Close the data file
fclose(outp); // Close the output file
}
else // If the file is empty then print an error message
{
    printf("There is something wrong. Maybe file is empty.\n");
}

/*.... Find dx, M, dtexpl, dt and the maxsteps........*/
dx = 1.0/ (double) MM;
M = b * MM;
dtexpl = (dx * dx) / (2.0 * D);
dt = dtfactor * dtexpl;
maxsteps = (int)( tend / dt ) + 1;

/*...Pack integers in iparms array, reals in parms array...*/
int iparms[2] = {MM,M};
double parms[4] = {dx, dt, D, b}; 
MPI_BCAST(iparms,2, MPI_INT,0,MPI_COMM_WORLD);
MPI_BCAST(parms, 4, MPI_DOUBLE,0, MPI_COMM_WORLD);
}
Run Code Online (Sandbox Code Playgroud)

Hri*_*iev 6

运行时错误是由于MPICH的特定特征和C语言功能的不幸组合所致。

MPICH在单个库文件中同时提供C和Fortran接口代码:

000000000007c7a0 W MPI_BCAST
00000000000cd180 W MPI_Bcast
000000000007c7a0 W PMPI_BCAST
00000000000cd180 T PMPI_Bcast
000000000007c7a0 W mpi_bcast
000000000007c7a0 W mpi_bcast_
000000000007c7a0 W mpi_bcast__
000000000007c7a0 W pmpi_bcast
000000000007c7a0 T pmpi_bcast_
000000000007c7a0 W pmpi_bcast__
Run Code Online (Sandbox Code Playgroud)

为了同时支持许多不同的Fortran编译器(包括全部大写),在各种别名下导出了Fortran调用MPI_BCASTMPI_BCAST本身未在中声明,mpi.h但ANSI C允许在不使用原型声明的情况下调用函数。通过传递-std=c99给编译器来启用C99 会导致有关该MPI_BCAST函数的隐式声明的警告。也-Wall将导致警告。该代码将无法与Open MPI链接,Open MPI会在一个单独的库中提供Fortran接口,而该库mpicc没有链接。

即使代码正确编译和链接,Fortran函数也希望它们的所有参数都通过引用传递。同样,Fortran MPI调用采用附加的输出参数,在该参数中返回错误代码。因此分割错误。

为了防止将来出现此类错误,请使用编译-Wall -Werror,并应尽早发现类似问题。