测量线程的上下文切换时间

Question

测量线程的上下文切换时间

我想计算上下文切换时间，并且我正在考虑使用互斥体和条件变量在两个线程之间发出信号，以便一次只有一个线程运行。我可以用来CLOCK_MONOTONIC测量整个执行时间并CLOCK_THREAD_CPUTIME_ID测量每个线程运行的时间。
那么上下文切换时间就是(total_time - thread_1_time - thread_2_time)。为了获得更准确的结果，我可以循环它并取平均值。

这是估计上下文切换时间的正确方法吗？我想不出任何可能出错的地方，但我得到的答案不到 1 纳秒。

我忘了提及，循环并取平均值的时间越多，得到的结果就越小。

编辑

这是我的代码片段

    typedef struct
    {
      struct timespec start;
      struct timespec end;
    }thread_time;

    ...


    // each thread function looks similar like this
    void* thread_1_func(void* time)
    {
       thread_time* thread_time = (thread_time*) time;

       clock_gettime(CLOCK_THREAD_CPUTIME_ID, &(thread_time->start)); 
       for(x = 0; x < loop; ++x)
       {
         //where it switches to another thread
       }
       clock_gettime(CLOCK_THREAD_CPUTIME_ID, &(thread_time->end));

       return NULL;
   };

   void* thread_2_func(void* time)
   {
      //similar as above
   }

   int main()
   {
      ...
      pthread_t thread_1;
      pthread_t thread_2;

      thread_time thread_1_time;
      thread_time thread_2_time;

      struct timespec start, end;

      // stamps the start time 
      clock_gettime(CLOCK_MONOTONIC, &start);

      // create two threads with the time structs as the arguments 
      pthread_create(&thread_1, NULL, &thread_1_func, (void*) &thread_1_time);
      pthread_create(&thread_2, NULL, &thread_2_func, (void*) &thread_2_time); 
      // waits for the two threads to terminate 
      pthread_join(thread_1, NULL);
      pthread_join(thread_2, NULL);

      // stamps the end time 
      clock_gettime(CLOCK_MONOTONIC, &end);

      // then I calculate the difference between between total execution time and the total execution time of two different threads..
   }

Run Code Online (Sandbox Code Playgroud)

Answer 1

Ant*_*ala 2

首先，使用CLOCK_THREAD_CPUTIME_ID可能是非常错误的；该时钟将给出该线程在用户模式下花费的时间。然而，上下文切换不会在用户模式下发生，您需要使用另一个时钟。此外，在多处理系统上，时钟可以为每个处理器提供不同的值！因此我建议您使用CLOCK_REALTIMEorCLOCK_MONOTONIC代替。但请注意，即使您快速连续读取其中任何一个两次，时间戳通常也会相差数十纳秒。

至于上下文切换——上下文切换有很多种。最快的方法是完全在软件中从一个线程切换到另一个线程。这只是意味着您将旧寄存器压入堆栈，设置任务切换标志，以便延迟保存 SSE/FP 寄存器，保存堆栈指针，加载新堆栈指针并从该函数返回 - 因为其他线程也做了同样的事情，该函数的返回发生在另一个线程中。

这种线程到线程的切换非常快，其开销与任何系统调用大致相同。从一个进程切换到另一个进程要慢得多：这是因为必须通过设置 CR0 寄存器来刷新和切换用户空间页表；这会导致将虚拟地址映射到物理地址的 TLB 丢失。

然而，<1 ns 上下文切换/系统调用开销似乎不太合理 - 这里很可能存在超线程或 2 个 CPU 核心，因此我建议您在该进程上设置 CPU 关联性，以便 Linux 只运行就说第一个CPU核心：

#include <sched.h>

cpu_set_t  mask;
CPU_ZERO(&mask);
CPU_SET(0, &mask);
result = sched_setaffinity(0, sizeof(mask), &mask);

Run Code Online (Sandbox Code Playgroud)

那么您应该非常确定您测量的时间来自真实的上下文切换。另外，为了测量切换浮点/SSE堆栈的时间（这会延迟发生），您应该有一些浮点变量并在上下文切换之前对它们进行计算，然后在上下文切换后将 say 添加到一些易失性浮点.1变量看看它是否对切换时间有影响。

归档时间：	9 年，7 月前
查看次数：	4282 次
最近记录：	9 年，7 月前