Cau*_*ity 4 string int haskell type-conversion bytestring
只是对Bytestring和String做一些简单的基准测试.代码加载10,000,000行的文件,每行一个整数; 然后将每个字符串转换为整数.事实证明Prelude.read比慢得多ByteString.readInt.
我想知道效率低下的原因是什么.同时,我也不确定性能分析报告的哪一部分对应于加载文件的时间成本(数据文件大约为75 MB).
这是测试的代码:
import System.Environment
import System.IO
import qualified Data.ByteString.Lazy.Char8 as LC
main :: IO ()
main = do
xs <- getArgs
let file = xs !! 0
inputIo <- readFile file
let iIo = map readInt . linesStr $ inputIo
let sIo = sum iIo
inputIoBs <- LC.readFile file
let iIoBs = map readIntBs . linesBs $ inputIoBs
let sIoBs = sum iIoBs
print [sIo, sIoBs]
linesStr = lines
linesBs = LC.lines
readInt :: String -> Int
readInt x = read x :: Int
readIntBs :: LC.ByteString -> Int
readIntBs bs = case LC.readInt bs of
Nothing -> error "Not an integer"
Just (x, _) -> x
Run Code Online (Sandbox Code Playgroud)
代码编译和执行如下:
> ghc -o strO2 -O2 --make Str.hs -prof -auto-all -caf-all -rtsopts
> ./strO2 a.dat +RTS -K500M -p
Run Code Online (Sandbox Code Playgroud)
注意"a.dat"是上述格式,大约75MB.分析结果是:
strO2 +RTS -K500M -p -RTS a.dat
total time = 116.41 secs (116411 ticks @ 1000 us, 1 processor)
total alloc = 117,350,372,624 bytes (excludes profiling overheads)
COST CENTRE MODULE %time %alloc
readInt Main 86.9 74.6
main.iIo Main 8.7 9.5
main Main 2.9 13.5
main.iIoBs Main 0.6 1.9
individual inherited
COST CENTRE MODULE no. entries %time %alloc %time %alloc
MAIN MAIN 54 0 0.0 0.0 100.0 100.0
main Main 109 0 2.9 13.5 100.0 100.0
main.iIoBs Main 116 1 0.6 1.9 1.3 2.4
readIntBs Main 118 10000000 0.7 0.5 0.7 0.5
main.sIoBs Main 115 1 0.0 0.0 0.0 0.0
main.sIo Main 113 1 0.2 0.0 0.2 0.0
main.iIo Main 111 1 8.7 9.5 95.6 84.1
readInt Main 114 10000000 86.9 74.6 86.9 74.6
main.file Main 110 1 0.0 0.0 0.0 0.0
CAF:main1 Main 106 0 0.0 0.0 0.0 0.0
main Main 108 1 0.0 0.0 0.0 0.0
CAF:linesBs Main 105 0 0.0 0.0 0.0 0.0
linesBs Main 117 1 0.0 0.0 0.0 0.0
CAF:linesStr Main 104 0 0.0 0.0 0.0 0.0
linesStr Main 112 1 0.0 0.0 0.0 0.0
CAF GHC.Conc.Signal 100 0 0.0 0.0 0.0 0.0
CAF GHC.IO.Encoding 93 0 0.0 0.0 0.0 0.0
CAF GHC.IO.Encoding.Iconv 91 0 0.0 0.0 0.0 0.0
CAF GHC.IO.FD 86 0 0.0 0.0 0.0 0.0
CAF GHC.IO.Handle.FD 84 0 0.0 0.0 0.0 0.0
CAF Text.Read.Lex 70 0 0.0 0.0 0.0 0.0
Run Code Online (Sandbox Code Playgroud)
编辑:
输入文件"a.dat"是10,000,000行数:
1
2
3
...
10000000
Run Code Online (Sandbox Code Playgroud)
在讨论之后,我将"a.dat"替换为10,000,000行1,这不会影响上述性能观察:
1
1
...
1
Run Code Online (Sandbox Code Playgroud)
read比起来的工作要艰苦得多readInt.例如,比较:
> map read ["(100)", " 100", "- 100"] :: [Int]
[100,100,-100]
> map readInt ["(100)", " 100", "- 100"]
[Nothing,Nothing,Nothing]
Run Code Online (Sandbox Code Playgroud)
read本质上是解析Haskell.再加上它消耗链表,这一点并不奇怪,确实非常慢.