分析两个总结大量列表的函数

Kev*_*ith 4 parallel-processing haskell

我刚开始阅读Haskell中的并行和并发编程.

我写了两个程序,我相信这两个程序总结了一个列表:

  • 赛跑 rpar (force (sum list))
  • 拆分列表,在每个列表上运行上面的命令,然后添加每个列表

这是代码:

import Control.Parallel.Strategies
import Control.DeepSeq
import System.Environment

main :: IO ()
main = do
  [n] <- getArgs
  [single, faster] !! (read n - 1)

single :: IO ()
single = print . runEval $ rpar (sum list)

faster :: IO ()
faster = print . runEval $ do
    let (as, bs) = splitAt ((length list) `div` 2) list
    res1 <- rpar (sum as)
    res2 <- rpar (sum bs)
    return (res1 + res2)


list :: [Integer]
list = [1..10000000]
Run Code Online (Sandbox Code Playgroud)

在启用并行化的情况下编译(-threaded)

C:\Users\k\Workspace\parallel_concurrent_haskell>ghc Sum.hs -O2 -threaded -rtsopts
[1 of 1] Compiling Main             ( Sum.hs, Sum.o )
Linking Sum.exe ...
Run Code Online (Sandbox Code Playgroud)

single计划的结果

C:\Users\k\Workspace\parallel_concurrent_haskell>Sum 1 +RTS -s -N2
50000005000000
     960,065,896 bytes allocated in the heap
         363,696 bytes copied during GC
          43,832 bytes maximum residency (2 sample(s))
          57,016 bytes maximum slop
               2 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0      1837 colls,  1837 par    0.00s    0.01s     0.0000s    0.0007s
  Gen  1         2 colls,     1 par    0.00s    0.00s     0.0002s    0.0003s

  Parallel GC work balance: 0.18% (serial 0%, perfect 100%)

  TASKS: 4 (1 bound, 3 peak workers (3 total), using -N2)

  SPARKS: 1 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 1 fizzled)

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    0.27s  (  0.27s elapsed)
  GC      time    0.00s  (  0.01s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time    0.27s  (  0.28s elapsed)

  Alloc rate    3,614,365,726 bytes per MUT second

  Productivity 100.0% of total user, 95.1% of total elapsed

gc_alloc_block_sync: 573
whitehole_spin: 0
gen[0].sync: 0
gen[1].sync: 0
Run Code Online (Sandbox Code Playgroud)

运行 faster

C:\Users\k\Workspace\parallel_concurrent_haskell>Sum 2 +RTS -s -N2
50000005000000
   1,600,100,336 bytes allocated in the heap
   1,477,564,464 bytes copied during GC
     400,027,984 bytes maximum residency (14 sample(s))
      70,377,336 bytes maximum slop
             911 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0      3067 colls,  3067 par    1.05s    0.68s     0.0002s    0.0021s
  Gen  1        14 colls,    13 par    1.98s    1.53s     0.1093s    0.5271s

  Parallel GC work balance: 0.00% (serial 0%, perfect 100%)

  TASKS: 4 (1 bound, 3 peak workers (3 total), using -N2)

  SPARKS: 2 (0 converted, 0 overflowed, 0 dud, 1 GC'd, 1 fizzled)

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    0.38s  (  1.74s elapsed)
  GC      time    3.03s  (  2.21s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time    3.42s  (  3.95s elapsed)

  Alloc rate    4,266,934,229 bytes per MUT second

  Productivity  11.4% of total user, 9.9% of total elapsed

gc_alloc_block_sync: 335
whitehole_spin: 0
gen[0].sync: 0
gen[1].sync: 0
Run Code Online (Sandbox Code Playgroud)

为什么single0.28秒内完成,但faster(名字很差,显然)花了3.95秒

ama*_*loy 5

我不是特定于haskell的分析专家,但我可以看到几个可能的问题faster.你正在走输入列表至少三次:一次得到它的长度,一次得到splitAt(也许它是两次,我不完全确定它是如何实现的),然后再次读取和求和它的元素.在single,列表只走了一次.

你也可以同时将整个列表保存在内存中faster,但是singlehaskell可以懒得处理它,并且GC可以随时处理.如果查看分析输出,可以看到faster在GC期间复制了更多字节:超过3,000倍!faster同时也需要400MB内存,一次single只需要40KB.所以垃圾收集器有一个更大的空间来保持扫描.

另一个大问题是:你分配了大量新的利弊细胞faster,以保存两个中间子列表.即使它可以立即全部GC,这也是分配的大量时间.它比开始添加更昂贵!所以即使在你开始添加之前,你已经"超出预算"了simple.