为什么这个程序似乎没有正确融合？

Question

为什么这个程序似乎没有正确融合？

我怀疑一个给定的程序没有像它那样融合并且做了这个测试以确认:

module Main where

import qualified Data.Vector.Unboxed as V

main :: IO ()
main = do

  let size = 100000000 :: Int
  let array = V.replicate size 0 :: V.Vector Int
  let incAll = V.map (+ 1)

  print 
    . V.sum 

    . incAll 
    . incAll 
    . incAll 
    . incAll 
    . incAll 
    . incAll 
    . incAll 
    . incAll 
    . incAll 
    . incAll 
    . incAll 
    . incAll 
    . incAll 
    . incAll 
    . incAll 
    . incAll 

    . incAll 
    . incAll 
    . incAll 
    . incAll 
    . incAll 
    . incAll 
    . incAll 
    . incAll 
    . incAll 
    . incAll 
    . incAll 
    . incAll 
    . incAll 
    . incAll 
    . incAll 
    . incAll 

    $ array

Run Code Online (Sandbox Code Playgroud)

更incAll是你添加,低效率的程序变成,我相信这,意味着流融合不踢.我使用GHC 8.0.1,与堆栈建设中,但我已经包含-O2上.cabal的ghc-options.我错过了什么吗？

Answer 1

Zet*_*eta 5

^{注意:我在Windows(x64)上使用GHC 7.10.3和堆栈1.1.2,因此您的时间可能会有所不同.}

TL; DR

如果要使用流融合,请确保内联函数.

如何融合流

流融合在很大程度上依赖于优化器和重写规则,至少在向量包中.因此,让我们检查一下您的程序的哪些版本已经过优化.

最小版本(1 `incAll`)

让我们开始吧.我们首先将程序减少到最低限度:

-- SOBase.hs
module Main where

import qualified Data.Vector.Unboxed as V

main :: IO ()
main = do

  let size = 100000000 :: Int
  let array = V.replicate size 0 :: V.Vector Int
  let incAll = V.map (+ 1)

  print 
    . V.sum     
    . incAll    
    $ array

Run Code Online (Sandbox Code Playgroud)

让我们编译它并转储GHC生成的核心:

$ stack ghc --package vector -- -O2 SOBase.hs -ddump-simpl -dsuppress-all

main2
main2 =
  case (runSTRep main3) `cast` ...
  of _ { Vector ipv_s6b2 ipv1_s6b3 ipv2_s6b4 ->
  letrec {
    $s$wfoldlM'_loop_s9wM
    $s$wfoldlM'_loop_s9wM =
      \ sc_s9wK sc1_s9wL ->
        case tagToEnum# (>=# sc1_s9wL ipv1_s6b3) of _ {
          False ->
            case indexIntArray# ipv2_s6b4 (+# ipv_s6b2 sc1_s9wL)
            of wild_a5ju { __DEFAULT ->
            $s$wfoldlM'_loop_s9wM (+# sc_s9wK (+# wild_a5ju 1)) (+# sc1_s9wL 1)
            };
          True -> sc_s9wK
        }; } in
  case $s$wfoldlM'_loop_s9wM 0 0 of ww_s94k { __DEFAULT ->
  case $wshowSignedInt 0 ww_s94k ([])
  of _ { (# ww5_a5fH, ww6_a5fI #) ->
  : ww5_a5fH ww6_a5fI
  }
  }
  }

Run Code Online (Sandbox Code Playgroud)

让我们做得更漂亮一点:

main2 = let foldLoop s n 
              | n < size  = foldLoop (s + (vec ! n + 1)) (n + 1)
              | otherwise = s
        in print (foldLoop 0 0)

Run Code Online (Sandbox Code Playgroud)

在incAll已经联到函数:

case indexIntArray# ipv2_s6b4 (+# ipv_s6b2 sc1_s9wL)
                of wild_a5ju { __DEFAULT ->
                $s$wfoldlM'_loop_s9wM (+# sc_s9wK (+# wild_a5ju 1)) (+# sc1_s9wL 1)
                                                  ^^^^^^^^^^^^^^^^

Run Code Online (Sandbox Code Playgroud)

内联功能(3 `incAll`s)

让我们添加一个INLINE编译指示:

-- SO3I.hs
module Main where

import qualified Data.Vector.Unboxed as V

main :: IO ()
main = do

  let size = 100000000 :: Int
  let array = V.replicate size 0 :: V.Vector Int
  let {-# INLINE incAll #-}
      incAll = V.map (+1)
  print 
    . V.sum 

    . incAll 
    . incAll 
    . incAll 

    $ array

Run Code Online (Sandbox Code Playgroud)

stack ghc --package vector -- -O2 -ddump-simpl SO3I.hs

Run Code Online (Sandbox Code Playgroud)

该如何main现在是什么样子？

main2                                                                         
main2 =                                                                       
  case (runSTRep main3) `cast` ...                                            
  of _ { Vector ipv_s6bG ipv1_s6bH ipv2_s6bI ->                               
  letrec {                                                                    
    $s$wfoldlM'_loop_s9z7                                                     
    $s$wfoldlM'_loop_s9z7 =                                                   
      \ sc_s9z5 sc1_s9z6 ->                                                   
        case tagToEnum# (>=# sc1_s9z6 ipv1_s6bH) of _ {                       
          False ->                                                            
            case indexIntArray# ipv2_s6bI (+# ipv_s6bG sc1_s9z6)              
            of wild_a5jC { __DEFAULT ->                                       
            $s$wfoldlM'_loop_s9z7                                             
              (+# sc_s9z5 (+# (+# (+# wild_a5jC 1) 1) 1)) (+# sc1_s9z6 1)     
            };                                                                
          True -> sc_s9z5                                                     
        }; } in                                                               
  case $s$wfoldlM'_loop_s9z7 0 0 of ww_s96F { __DEFAULT ->                    
  case $wshowSignedInt 0 ww_s96F ([])                                         
  of _ { (# ww5_a5fP, ww6_a5fQ #) ->                                          
  : ww5_a5fP ww6_a5fQ                                                         
  }                                                                           
  }                                                                           
  }

Run Code Online (Sandbox Code Playgroud)

大.incAll已经内联,可以在这里看到:

(+# sc_s9z5 (+# (+# (+# wild_a5jC 1) 1) 1)) (+# sc1_s9z6 1)     
                                  ^  ^  ^

Run Code Online (Sandbox Code Playgroud)

所以问题是incAll没有内联,因此你没有最终结果

V.sum . V.map (+1) . V.map (+1) . V.map (+1)

Run Code Online (Sandbox Code Playgroud)

你原来的程序(现在内联,32 `incAll`秒)

最后但并非最不重要的是,让我们再次尝试您的原始程序,这次使用内联.一切都固定了吗？我们来看看核心:

main2
main2 =
  case (runSTRep main3) `cast` ...
  of _ { Vector ipv_s6xF ipv1_s6xG ipv2_s6xH ->
  letrec {
    $s$wfoldlM'_loop_sajT
    $s$wfoldlM'_loop_sajT =
      \ sc_sajR sc1_sajS ->
        case tagToEnum# (>=# sc1_sajS ipv1_s6xG) of _ {
          False ->
            case indexIntArray# ipv2_s6xH (+# ipv_s6xF sc1_sajS)
            of wild_a5mq { __DEFAULT ->
            $s$wfoldlM'_loop_sajT
              (+#
                 sc_sajR
                 (+#
                    (+#
                       (+#
                          (+#
                             (+#
                                (+#
                                   (+#
                                      (+#
                                         (+#
                                            (+#
                                               (+#
                                                  (+#
                                                     (+#
                                                        (+#
                                                           (+#
                                                              (+#
                                                                 (+#
                                                                    (+#
                                                                       (+#
                                                                          (+#
                                                                             (+#
                                                                                (+#
                                                                                   (+#
                                                                                      (+#
                                                                                         (+#
                                                                                            (+#
                                                                                               (+#
                                                                                                  (+#
                                                                                                     (+#
                                                                                                        (+#
                                                                                                           (+#
                                                                                                              (+#
                                                                                                                 wild_a5mq
                                                                                                                 1)
                                                                                                              1)
                                                                                                           1)
                                                                                                        1)
                                                                                                     1)
                                                                                                  1)
                                                                                               1)
                                                                                            1)
                                                                                         1)
                                                                                      1)
                                                                                   1)
                                                                                1)
                                                                             1)
                                                                          1)
                                                                       1)
                                                                    1)
                                                                 1)
                                                              1)
                                                           1)
                                                        1)
                                                     1)
                                                  1)
                                               1)
                                            1)
                                         1)
                                      1)
                                   1)
                                1)
                             1)
                          1)
                       1)
                    1))
              (+# sc1_sajS 1)
            };
          True -> sc_sajR
        }; } in
  case $s$wfoldlM'_loop_sajT 0 0 of ww_s9Rr { __DEFAULT ->
  case $wshowSignedInt 0 ww_s9Rr ([])
  of _ { (# ww5_a5iD, ww6_a5iE #) ->
  : ww5_a5iD ww6_a5iE
  }
  }
  }

Run Code Online (Sandbox Code Playgroud)

嗯,是.但GHC是不够聪明地把(+1) . (+1)对(+2)等.它真的更快吗？

$ stack ghc --package vector -- -O2 SO.hs && SO.exe +RTS -s
  26,400,052,464 bytes allocated in the heap                                             
           9,736 bytes copied during GC                                                  
     800,026,736 bytes maximum residency (2 sample(s))                                   
          61,328 bytes maximum slop                                                      
            1527 MB total memory in use (0 MB lost due to fragmentation)                 

                                     Tot time (elapsed)  Avg pause  Max pause            
  Gen  0        32 colls,     0 par    0.000s   0.000s     0.0000s    0.0000s            
  Gen  1         2 colls,     0 par    0.000s   0.089s     0.0446s    0.0890s            

  INIT    time    0.000s  (  0.000s elapsed)                                             
  MUT     time    4.453s  (  4.616s elapsed)                                             
  GC      time    0.000s  (  0.090s elapsed)                                             
  EXIT    time    0.000s  (  0.089s elapsed)                                             
  Total   time    4.453s  (  4.795s elapsed)                                             

  %GC     time       0.0%  (1.9% elapsed)                                                

  Alloc rate    5,928,432,834 bytes per MUT second                                       

  Productivity 100.0% of total user, 92.9% of total elapsed

Run Code Online (Sandbox Code Playgroud)

原始节目4秒.对于内联的一个？

$ stack ghc --package vector -- -O2 SOFixed.hs && SOFixed.exe +RTS -s
3200000000
     800,048,112 bytes allocated in the heap
           4,352 bytes copied during GC
          42,664 bytes maximum residency (1 sample(s))
          18,776 bytes maximum slop
             764 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0         1 colls,     0 par    0.000s   0.000s     0.0000s    0.0000s
  Gen  1         1 colls,     0 par    0.000s   0.045s     0.0452s    0.0452s

  INIT    time    0.000s  (  0.000s elapsed)
  MUT     time    0.188s  (  0.224s elapsed)
  GC      time    0.000s  (  0.045s elapsed)
  EXIT    time    0.000s  (  0.045s elapsed)
  Total   time    0.188s  (  0.315s elapsed)

  %GC     time       0.0%  (14.4% elapsed)

  Alloc rate    4,266,923,264 bytes per MUT second

  Productivity 100.0% of total user, 59.6% of total elapsed

Run Code Online (Sandbox Code Playgroud)

0.1秒大!顺便说一句,所有的(+1)调用都被优化为单行addq $32,....

归档时间：	8 年，10 月前
查看次数：	158 次
最近记录：	8 年，10 月前

为什么这个程序似乎没有正确融合？

TL; DR

如何融合流

最小版本(1 incAll)

更多通话(3 incAlls)

内联功能(3 incAlls)

你原来的程序(现在内联,32 incAll秒)

最小版本(1 `incAll`)

更多通话(3 `incAll`s)

内联功能(3 `incAll`s)

你原来的程序(现在内联,32 `incAll`秒)