为什么在确实发送消息时MPI_Iprobe返回false?

Pet*_*e W 11 mpi

我想使用MPI_Iprobe来测试具有给定标记的消息是否已经挂起.

但是,MPI_Iprobe的行为并不像我预期的那样.在下面的示例中,我将来自多个任务的消息发送到单个任务(等级0).然后在0级,我等待几秒钟,以便有足够的时间让MPI_Isends完成.然后当我运行MPI_Iprobe时,它返回标志为false.如果我在(阻塞)MPI_Probe之后重复,则返回true.

#include "mpi.h"
#include <stdio.h>
#include <unistd.h>

int main(int argc, char *argv[])
{
  int rank;
  int numprocs;
  int tag;
  int receive_tag;
  int flag=0;
  int number;
  int recv_number=0;

  MPI_Request request;
  MPI_Status status;

  MPI_Init(&argc,&argv);
  MPI_Comm_rank(MPI_COMM_WORLD,&rank);
  MPI_Comm_size(MPI_COMM_WORLD,&numprocs);

  // rank 0 receives messages, all others send messages
  if (rank > 0 ) {
    number = rank;
    tag = rank;
    MPI_Isend(&number, 1, MPI_INT, 0, tag, MPI_COMM_WORLD,&request); // send to rank 0
    printf("Sending tag : %d \n",tag);
   } 
   else if (rank == 0) {

   sleep(5); // [seconds] allow plenty of time for all sends from other tasks to complete

   receive_tag = 3; // just try and receive a single message from task 1

   MPI_Iprobe(MPI_ANY_SOURCE,receive_tag,MPI_COMM_WORLD,&flag,&status);
   printf("After MPI_Iprobe, flag = %d \n",flag);

   MPI_Probe(MPI_ANY_SOURCE,receive_tag,MPI_COMM_WORLD,&status);
   printf("After MPI_Probe, found message with tag : %d \n",receive_tag);

   MPI_Iprobe(MPI_ANY_SOURCE,receive_tag,MPI_COMM_WORLD,&flag,&status);
   printf("After second MPI_Iprobe, flag = %d \n",flag);

   // receive all the messages
   for (int i=1;i<numprocs;i++){    
     MPI_Recv(&recv_number, 1, MPI_INT, MPI_ANY_SOURCE, i, MPI_COMM_WORLD,&status);
     printf("Received : %d \n",recv_number);
   }

 }
 MPI_Finalize();
}
Run Code Online (Sandbox Code Playgroud)

给出这个输出:

Sending tag : 4 
Sending tag : 3 
Sending tag : 2 
Sending tag : 5 
Sending tag : 1 
After MPI_Iprobe, flag = 0 
After MPI_Probe, found message with tag : 3 
After second MPI_Iprobe, flag = 1 
Received : 1 
Received : 2 
Received : 3 
Received : 4 
Received : 5 
Run Code Online (Sandbox Code Playgroud)

为什么mpi_iprobe第一次返回'false'?

任何帮助将非常感激!


编辑:在Hristo Iliev的回答后,我现在有以下代码:

#include "mpi.h"
#include <stdio.h>
#include <unistd.h>

int main(int argc, char *argv[])
{
  int rank;
  int numprocs;
  int tag;
  int receive_tag;
  int flag=0;
  int number;
  int recv_number=0;

  MPI_Request request;
  MPI_Status status;

  MPI_Init(&argc,&argv);
  MPI_Comm_rank(MPI_COMM_WORLD,&rank);
  MPI_Comm_size(MPI_COMM_WORLD,&numprocs);

  // rank 0 receives messages, all others send messages
  if (rank > 0 ) {
    number = rank;
    tag = rank;

    MPI_Isend(&number, 1, MPI_INT, 0, tag, MPI_COMM_WORLD,&request); // send to rank 0
    printf("Sending tag : %d \n",tag);

    // do stuff

    MPI_Wait(&request,&status);
    printf("Sent tag : %d \n",tag);

   }
    else if (rank == 0) {

    sleep(5); // [seconds] allow plenty of time for all sends from other tasks to complete

    receive_tag = 3; // just try and receive a single message from task 1

    MPI_Iprobe(MPI_ANY_SOURCE,receive_tag,MPI_COMM_WORLD,&flag,&status);
    printf("After MPI_Iprobe, flag = %d \n",flag);

    MPI_Probe(MPI_ANY_SOURCE,receive_tag,MPI_COMM_WORLD,&status);
    printf("After MPI_Probe, found message with tag : %d \n",receive_tag);

    MPI_Iprobe(MPI_ANY_SOURCE,receive_tag,MPI_COMM_WORLD,&flag,&status);
    printf("After second MPI_Iprobe, flag = %d \n",flag);

    // receive all the other messages
    for (int i=1;i<numprocs;i++){   
       MPI_Recv(&recv_number, 1, MPI_INT, MPI_ANY_SOURCE, i, MPI_COMM_WORLD,&status);
    }

 }
 MPI_Finalize();
}
Run Code Online (Sandbox Code Playgroud)

其中给出了以下输出:

Sending tag : 5 
Sending tag : 2 
Sending tag : 1 
Sending tag : 4 
Sending tag : 3 
Sent tag : 2 
Sent tag : 1 
Sent tag : 5 
Sent tag : 4 
Sent tag : 3 
After MPI_Iprobe, flag = 0 
After MPI_Probe, found message with tag : 3 
After second MPI_Iprobe, flag = 1 
Run Code Online (Sandbox Code Playgroud)

Hri*_*iev 13

您正在使用MPI_Isend以发送消息.MPI_Isend 启动异步(后台)数据传输.除非在请求中进行了其中一个MPI_Wait*或多个MPI_Test*调用,否则可能不会发生实际的数据传输.一些MPI实现具有(或可以配置)后台进程线程,即使没有对请求进行等待/测试,也将推进发送操作,但是不应该依赖于这种行为.

只需更换MPI_IsendMPI_Send或添加MPI_Wait(&request);后前(记住,虽然是MPI_Isend+ MPI_Wait后立即相当于MPI_Send).

MPI_Iprobe 旨在用于繁忙的等待,即:

while (condition)
{
   MPI_Iprobe(...,&flag,...);
   if (flag)
   {
      MPI_Recv(...);
      ...
   }
   // Do something, e.g. background tasks
}
Run Code Online (Sandbox Code Playgroud)

实际MPI实现中的实际消息传输是非常复杂的事情.操作通常分为多个部分,然后排队.执行这些部分称为进程,它在MPI库中的各个点完成,例如,当进行通信调用时,或者如果库实现后台进程线程,则在后台完成.呼叫MPI_Iprobe肯定会取得进展,但不能保证单个呼叫就足够了.MPI标准规定:

MPI实现MPI_PROBE并且MPI_IPROBE需要保证进度:如果某个MPI_PROBE进程已发出调用,并且某个进程已启动与该探测匹配的发送,则该调用MPI_PROBE将返回,除非该消息被另一个并发消息接收接收操作(在探测过程中由另一个线程执行).类似地,如果进程忙等待MPI_IPROBE并且已发出匹配消息,则最终将返回调用,MPI_IPROBEflag = true除非该消息是由另一个并发接收操作接收的.

注意最终的使用.如何进行进展是特定于实施的.比较以下5次连续调用的输出MPI_Iprobe(您的原始代码+紧密循环):

打开MPI 1.6.5 w/o进度线程:

# Run 1
After MPI_Iprobe, flag = 0
After MPI_Iprobe, flag = 0
After MPI_Iprobe, flag = 0
After MPI_Iprobe, flag = 1
After MPI_Iprobe, flag = 1

# Run 2
After MPI_Iprobe, flag = 0
After MPI_Iprobe, flag = 1
After MPI_Iprobe, flag = 1
After MPI_Iprobe, flag = 1
After MPI_Iprobe, flag = 1

# Run 3
After MPI_Iprobe, flag = 0
After MPI_Iprobe, flag = 0
After MPI_Iprobe, flag = 0
After MPI_Iprobe, flag = 0
After MPI_Iprobe, flag = 0
Run Code Online (Sandbox Code Playgroud)

观察到相同MPI程序的多次执行之间没有一致性,并且在第3次运行中,标志仍然false在5次调用之后MPI_Iprobe.

英特尔MPI 4.1.2:

# Run 1
After MPI_Iprobe, flag = 0
After MPI_Iprobe, flag = 0
After MPI_Iprobe, flag = 1
After MPI_Iprobe, flag = 1
After MPI_Iprobe, flag = 1

# Run 2
After MPI_Iprobe, flag = 0
After MPI_Iprobe, flag = 0
After MPI_Iprobe, flag = 1
After MPI_Iprobe, flag = 1
After MPI_Iprobe, flag = 1

# Run 3
After MPI_Iprobe, flag = 0
After MPI_Iprobe, flag = 0
After MPI_Iprobe, flag = 1
After MPI_Iprobe, flag = 1
After MPI_Iprobe, flag = 1
Run Code Online (Sandbox Code Playgroud)

显然,英特尔MPI的进展与Open MPI不同.

两个实现之间的区别可以解释MPI_Iprobe为应该是一个微小的探测器,因此它应该花费尽可能少的时间.另一方面,进展需要时间,并且在单线程MPI实现中,可能进展的唯一时间点是调用MPI_Iprobe(在该特定情况下).因此,MPI实施者必须决定每次呼叫实际进展多少,MPI_Iprobe并在呼叫完成的工作量和所花费的时间之间取得平衡.

随着MPI_Probe东西是不同的.这是一个阻塞呼叫,因此它能够不断前进,直到出现匹配的消息(更具体地说是其包络).

  • “MPI_Iprobe”旨在用于忙等待循环。在操作正确进行之前,可能需要多次调用“MPI_Iprobe”。“MPI_Probe”是阻塞的,因此在操作进行到消息信封已被接收并匹配之前它不会返回。 (2认同)