在多核机器上进行.NET操作的非线性扩展

Question

在多核机器上进行.NET操作的非线性扩展

LBu*_*kin 8 c# linq parallel-processing performance plinq

我在.NET应用程序中遇到了一种奇怪的行为,它对一组内存数据执行一些高度并行的处理.

当在多核处理器(IntelCore2 Quad Q6600 2.4GHz)上运行时,它会展示非线性缩放,因为多个线程被启动以处理数据.

当作为单核上的非多线程循环运行时,该过程能够每秒完成大约240万次计算.当作为四个线程运行时,您可以预期吞吐量的四倍 - 在每秒900万次计算的某个地方 - 但是,唉,没有.在实践中,它每秒仅完成约4.1百万......与预期的吞吐量相当短.

此外,无论我使用PLINQ,线程池还是四个显式创建的线程,都会发生这种情况.很奇怪...

使用CPU时间没有其他任何东西在机器上运行,计算中也没有任何锁或其他同步对象......它应该只是在数据中前进.我已经通过在进程运行时查看perfmon数据来确认这一点(尽可能)...并且没有报告的线程争用或垃圾收集活动.

我的理论目前:

所有技术(线程上下文切换等)的开销都压倒了计算
线程没有被分配到四个核心中的每一个并且花费一些时间在同一个处理器核心上等待...不确定如何测试这个理论......
.NET CLR线程未按预期优先级运行或具有一些隐藏的内部开销.

以下是代码中应该表现出相同行为的代表性摘录:

    var evaluator = new LookupBasedEvaluator();

    // find all ten-vertex polygons that are a subset of the set of points
    var ssg = new SubsetGenerator<PolygonData>(Points.All, 10);

    const int TEST_SIZE = 10000000;  // evaluate the first 10 million records

    // materialize the data into memory...
    var polygons = ssg.AsParallel()
                      .Take(TEST_SIZE)
                      .Cast<PolygonData>()
                      .ToArray();

    var sw1 = Stopwatch.StartNew();
    // for loop completes in about 4.02 seconds... ~ 2.483 million/sec
    foreach( var polygon in polygons )
        evaluator.Evaluate(polygon);
    s1.Stop(); 
    Console.WriteLine( "Linear, single core loop: {0}", s1.ElapsedMilliseconds );

    // now attempt the same thing in parallel using Parallel.ForEach...
    // MS documentation indicates this internally uses a worker thread pool
    // completes in 2.61 seconds ... or ~ 3.831 million/sec
    var sw2 = Stopwatch.StartNew();
    Parallel.ForEach(polygons, p => evaluator.Evaluate(p));
    sw2.Stop();
    Console.WriteLine( "Parallel.ForEach() loop: {0}", s2.ElapsedMilliseconds );

    // now using PLINQ, er get slightly better results, but not by much
    // completes in 2.21 seconds ... or ~ 4.524 million/second
    var sw3 = Stopwatch.StartNew();
    polygons.AsParallel(Environment.ProcessorCount)
            .AsUnordered() // no sure this is necessary...
            .ForAll( h => evalautor.Evaluate(h) );
    sw3.Stop();
    Console.WriteLine( "PLINQ.AsParallel.ForAll: {0}", s3.EllapsedMilliseconds );

    // now using four explicit threads:
    // best, still short of expectations at 1.99 seconds = ~ 5 million/sec
    ParameterizedThreadStart tsd = delegate(object pset) { foreach (var p in (IEnumerable<Card[]>) pset) evaluator.Evaluate(p); };
     var t1 = new Thread(tsd);
     var t2 = new Thread(tsd);
     var t3 = new Thread(tsd);
     var t4 = new Thread(tsd);

     var sw4 = Stopwatch.StartNew(); 
     t1.Start(hands);
     t2.Start(hands);
     t3.Start(hands);
     t4.Start(hands);
     t1.Join();
     t2.Join();
     t3.Join();
     t4.Join();
     sw.Stop();
     Console.WriteLine( "Four Explicit Threads: {0}", s4.EllapsedMilliseconds );

Run Code Online (Sandbox Code Playgroud)