为什么功能设计上的微小改变就能从根本上改变标准基准的结果?

sup*_*ate 3 benchmarking haskell haskell-criterion

我有两个源文件,它们的作用大致相同。唯一的区别是,在第一种情况下,函数作为参数传递,而在第二种情况下,函数作为参数传递。

\n

第一种情况:

\n
module Main where\n\nimport Data.Vector.Unboxed as UB\nimport qualified Data.Vector as V\n\nimport Criterion.Main\n\nregularVectorGenerator :: (Int -> t) -> V.Vector t\nregularVectorGenerator = V.generate 99999\n\nunboxedVectorGenerator :: Unbox t => (Int -> t) -> UB.Vector t\nunboxedVectorGenerator = UB.generate 99999\n\nmain :: IO ()\nmain = defaultMain\n    [\n        bench "boxed"   $ whnf regularVectorGenerator (+2137)\n      , bench "unboxed" $ whnf unboxedVectorGenerator (+2137)\n    ]\n
Run Code Online (Sandbox Code Playgroud)\n

第二种情况:

\n
module Main where\n\nimport Data.Vector.Unboxed as UB\nimport qualified Data.Vector as V\n\nimport Criterion.Main\n\nregularVectorGenerator :: Int -> V.Vector Int\nregularVectorGenerator = flip V.generate (+2137)\n\nunboxedVectorGenerator :: Int -> UB.Vector Int\nunboxedVectorGenerator = flip UB.generate (+2137)\n\nmain :: IO ()\nmain = defaultMain\n    [\n        bench "boxed"   $ whnf regularVectorGenerator 99999\n      , bench "unboxed" $ whnf unboxedVectorGenerator 99999\n    ]\n
Run Code Online (Sandbox Code Playgroud)\n

我注意到,在对向量大小进行基准测试期间,如预期的那样,未装箱的向量总是较小,但两个向量的大小却相差很大。这是输出

\n

第一种情况:

\n
 benchmarking boxed\n time                 7.626 ms   (7.515 ms .. 7.738 ms)\n                     0.999 R\xc2\xb2   (0.998 R\xc2\xb2 .. 0.999 R\xc2\xb2)\n mean                 7.532 ms   (7.472 ms .. 7.583 ms)\n std dev              164.3 \xce\xbcs   (133.8 \xce\xbcs .. 201.3 \xce\xbcs)\n allocated:           1.000 R\xc2\xb2   (1.000 R\xc2\xb2 .. 1.000 R\xc2\xb2)\n   iters              **1.680e7**    (1.680e7 .. 1.680e7)\n   y                  2357.390   (1556.690 .. 3422.724)\n\n benchmarking unboxed\n time                 889.1 \xce\xbcs   (878.9 \xce\xbcs .. 901.8 \xce\xbcs)\n                     0.998 R\xc2\xb2   (0.995 R\xc2\xb2 .. 0.999 R\xc2\xb2)\n mean                 868.6 \xce\xbcs   (858.6 \xce\xbcs .. 882.6 \xce\xbcs)\n std dev              39.05 \xce\xbcs   (28.30 \xce\xbcs .. 57.02 \xce\xbcs)\n allocated:           1.000 R\xc2\xb2   (1.000 R\xc2\xb2 .. 1.000 R\xc2\xb2)\n   iters              **4000009.003** (4000003.843 .. 4000014.143)\n   y                  2507.089   (2025.196 .. 3035.962)\n variance introduced by outliers: 36% (moderately inflated)\n
Run Code Online (Sandbox Code Playgroud)\n

第二种情况:

\n
 benchmarking boxed\n time                 1.366 ms   (1.357 ms .. 1.379 ms)\n                     0.999 R\xc2\xb2   (0.998 R\xc2\xb2 .. 1.000 R\xc2\xb2)\n mean                 1.350 ms   (1.343 ms .. 1.361 ms)\n std dev              29.96 \xce\xbcs   (21.74 \xce\xbcs .. 43.56 \xce\xbcs)\n allocated:           1.000 R\xc2\xb2   (1.000 R\xc2\xb2 .. 1.000 R\xc2\xb2)\n   iters              **2400818.350** (2400810.284 .. 2400826.685)\n  y                  2423.216   (1910.901 .. 3008.024)\n variance introduced by outliers: 12% (moderately inflated)\n\n benchmarking unboxed\n time                 61.30 \xce\xbcs   (61.24 \xce\xbcs .. 61.37 \xce\xbcs)\n                     1.000 R\xc2\xb2   (1.000 R\xc2\xb2 .. 1.000 R\xc2\xb2)\n mean                 61.29 \xce\xbcs   (61.25 \xce\xbcs .. 61.33 \xce\xbcs)\n std dev              122.1 ns   (91.64 ns .. 173.9 ns)\n allocated:           1.000 R\xc2\xb2   (1.000 R\xc2\xb2 .. 1.000 R\xc2\xb2)\n   iters              **800040.029** (800039.745 .. 800040.354)\n   y                  2553.830   (2264.684 .. 2865.637)\n
Run Code Online (Sandbox Code Playgroud)\n

仅通过去参数化函数,向量的基准大小就减少了一个数量级。有人可以解释一下为什么吗?

\n

我用这些标志编译了两个示例:

\n
\n

-O2 -rtsopts

\n
\n

并推出了

\n
\n

--回归分配:iters +RTS -T

\n
\n

And*_*ács 5

不同之处在于,如果生成函数在基准函数中已知,则生成器将被内联,并且所涉及的Int-s 也会被取消装箱。如果生成函数是基准参数,则无法内联。

从基准测试的角度来看,第二个版本是正确的,因为在正常使用中我们希望内联生成函数。