c ++为什么std :: async比顺序执行慢

Lin*_*Lin 9 c++ multithreading asynchronous c++11

#include <future>
#include <iostream>
#include <vector>
#include <cstdint>
#include <algorithm>
#include <random>
#include <chrono>
#include <utility>
#include <type_traits>

template <class Clock = std::chrono::high_resolution_clock, class Task>
double timing(Task&& t, typename std::result_of<Task()>::type* r = nullptr)
{
  using namespace std::chrono;
  auto begin = Clock::now();
  if (r != nullptr) *r = std::forward<Task>(t)();
  auto end = Clock::now();
  return duration_cast<duration<double>>(end - begin).count();
}

template <typename Num>
double sum(const std::vector<Num>& v, const std::size_t l, const std::size_t h)
{
  double s;
  for (auto i = l; i <= h; i++) s += v[i];
  return s;
}

template <typename Num>
double asum(const std::vector<Num>& v, const std::size_t l, const std::size_t h)
{
  auto m = (l + h) / 2;
  auto s1 = std::async(std::launch::async, sum<Num>, v, l, m);
  auto s2 = std::async(std::launch::async, sum<Num>, v, m+1, h);
  return s1.get() + s2.get();
}

int main()
{
  std::vector<uint> v(1000);
  auto s = std::chrono::system_clock::now().time_since_epoch().count();
  std::generate(v.begin(), v.end(), std::minstd_rand0(s));

  double r;
  std::cout << 1000 * timing([&]() -> double { return asum(v, 0, v.size() - 1); }, &r) << " msec | rst " << r << std::endl;
  std::cout << 1000 * timing([&]() -> double { return sum(v, 0, v.size() - 1); }, &r) << " msec | rst " << r << std::endl;
}
Run Code Online (Sandbox Code Playgroud)

嗨,

所以上面是用于求和随机数矢量的两个函数.

我做了好几次,但似乎我没有受益std::async.以下是我得到的一些结果.

0.130582 msec | rst 1.09015e+12
0.001402 msec | rst 1.09015e+12

0.23185 msec | rst 1.07046e+12
0.002308 msec | rst 1.07046e+12

0.18052 msec | rst 1.07449e+12
0.00244 msec | rst 1.07449e+12

0.190455 msec | rst 1.08319e+12
0.002315 msec | rst 1.08319e+12
Run Code Online (Sandbox Code Playgroud)

异步版本的所有四种情况都花费了更多时间.但理想情况下我应该快两倍吧?

我在代码中遗漏了什么吗?

顺便说一句,我在OS X 10.10.4Macbook Air上运行1.4 GHz Intel Core i5.

谢谢,

编辑:

  1. 编译器标志: g++ -o asum asum.cpp -std=c++11
  2. 我将标志更改为包含-O3和矢量大小10000000,但结果仍然是必需的.

72.1743 msec | rst 1.07349e+16
14.3739 msec | rst 1.07349e+16

58.3542 msec | rst 1.07372e+16
12.1143 msec | rst 1.07372e+16

57.1576 msec | rst 1.07371e+16
11.9332 msec | rst 1.07371e+16

59.9104 msec | rst 1.07395e+16
11.9923 msec | rst 1.07395e+16

64.032 msec | rst 1.07371e+16
12.0929 msec | rst 1.07371e+16
Run Code Online (Sandbox Code Playgroud)

Mas*_*nes 6

这里

auto s1 = std::async(std::launch::async, sum<Num>, v, l, m);
auto s2 = std::async(std::launch::async, sum<Num>, v, m+1, h);
Run Code Online (Sandbox Code Playgroud)

async将存储自己的矢量副本两次.您应该使用std::cref并确保在向量消失之前检索期货(就像在当前代码中一样)并且访问得到正确同步(就像在当前代码中一样).

正如评论中所提到的,线程创建开销可能会进一步降低代码速度.

  • @lingxiao nope,这些通用引用通过std :: thread转发到decay_t'ed构造函数,导致副本(或适用时的移动).出于这个原因,我们有std :: reference_wrapper (3认同)