c ++线程开销

and*_*dge 5 c++ performance multithreading c++11

我正在使用C++中的线程,特别是使用它们来并行化地图操作.

这是代码:

#include <thread>
#include <iostream>
#include <cstdlib>
#include <vector>
#include <math.h>
#include <stdio.h>

double multByTwo(double x){
  return x*2;
}

double doJunk(double x){
  return cos(pow(sin(x*2),3));
}

template <typename T>
void map(T* data, int n, T (*ptr)(T)){
  for (int i=0; i<n; i++)
    data[i] = (*ptr)(data[i]);
}

template <typename T>
void parallelMap(T* data, int n, T (*ptr)(T)){
  int NUMCORES = 3;
  std::vector<std::thread> threads;
  for (int i=0; i<NUMCORES; i++)
    threads.push_back(std::thread(&map<T>, data + i*n/NUMCORES, n/NUMCORES, ptr));
  for (std::thread& t : threads)
    t.join();
}

int main()
{
  int n = 1000000000;
  double* nums = new double[n];
  for (int i=0; i<n; i++)
    nums[i] = i;

  std::cout<<"go"<<std::endl;

  clock_t c1 = clock();

  struct timespec start, finish;
  double elapsed;

  clock_gettime(CLOCK_MONOTONIC, &start);

  // also try with &doJunk
  //parallelMap(nums, n, &multByTwo);
  map(nums, n, &doJunk);

  std::cout << nums[342] << std::endl;

  clock_gettime(CLOCK_MONOTONIC, &finish);

  printf("CPU elapsed time is %f seconds\n", double(clock()-c1)/CLOCKS_PER_SEC);

  elapsed = (finish.tv_sec - start.tv_sec);
  elapsed += (finish.tv_nsec - start.tv_nsec) / 1000000000.0;

  printf("Actual elapsed time is %f seconds\n", elapsed);
}
Run Code Online (Sandbox Code Playgroud)

multByTwo并行版本实际上是稍微慢(1.01秒,而0.95实时),并用其doJunk更快(51对136实时).这对我意味着

  1. 并行化正在发挥作用
  2. 声明新线程的开销非常大.有关为什么开销如此之大,以及如何避免它的任何想法?

Mar*_*som 7

只是一个猜测:你可能会看到的是multByTwo代码是如此之快,以至于你实现了内存饱和.无论你向它投入多少处理器能力,代码都不会运行得更快,因为它的速度已经达到了可以从RAM获取的速度.