以下两个程序的区别仅在于变量st的严格标志
$ cat testStrictL.hs
module Main (main) where
import qualified Data.Vector as V
import qualified Data.Vector.Generic as GV
import qualified Data.Vector.Mutable as MV
len = 5000000
testL = do
  mv <- MV.new len
  let go i = do
          if i >= len then return () else
             do  let st = show (i+10000000)  -- no strictness flag
                 MV.write mv i st
                 go (i+1)
  go 0
  v <- GV.unsafeFreeze mv :: IO (V.Vector String)
  return v
main =
  do
     v <- testL
     print (V.length v)
     mapM_ print $ V.toList $ V.slice 4000000 5 v
$ cat testStrictS.hs
module Main (main) where
import qualified Data.Vector as V
import qualified Data.Vector.Generic as GV
import qualified Data.Vector.Mutable as MV
len = 5000000
testS = do
  mv <- MV.new len
  let go i = do
          if i >= len then return () else
             do  let !st = show (i+10000000)  -- this has the strictness flag
                 MV.write mv i st
                 go (i+1)
  go 0
  v <- GV.unsafeFreeze mv :: IO (V.Vector String)
  return v
main =
  do
     v <- testS
     print (V.length v)
     mapM_ print $ V.toList $ V.slice 4000000 5 v
使用ghc 7.03在Ubuntu 10.10上编译和运行这两个程序,我得到以下结果
$ ghc --make testStrictL.hs -O3 -rtsopts  
[2 of 2] Compiling Main             ( testStrictL.hs, testStrictL.o )  
Linking testStrictL ...  
$ ghc --make testStrictS.hs -O3 -rtsopts  
[2 of 2] Compiling Main             ( testStrictS.hs, testStrictS.o )  
Linking testStrictS ...  
$ ./testStrictS +RTS -sstderr  
./testStrictS +RTS -sstderr  
5000000  
"14000000"  
"14000001"  
"14000002"  
"14000003"  
"14000004"  
     824,145,164 bytes allocated in the heap  
   1,531,590,312 bytes copied during GC  
     349,989,148 bytes maximum residency (6 sample(s))  
       1,464,492 bytes maximum slop  
             656 MB total memory in use (0 MB lost due to fragmentation)  
  Generation 0:  1526 collections,     0 parallel,  5.96s,  6.04s elapsed  
  Generation 1:     6 collections,     0 parallel,  2.79s,  4.36s elapsed  
  INIT  time    0.00s  (  0.00s elapsed)  
  MUT   time    1.77s  (  2.64s elapsed)  
  GC    time    8.76s  ( 10.40s elapsed)  
  EXIT  time    0.00s  (  0.13s elapsed)  
  Total time   10.52s  ( 13.04s elapsed)  
  %GC time      83.2%  (79.8% elapsed)  
  Alloc rate    466,113,027 bytes per MUT second  
  Productivity  16.8% of total user, 13.6% of total elapsed  
$ ./testStrictL +RTS -sstderr  
./testStrictL +RTS -sstderr  
5000000  
"14000000"  
"14000001"  
"14000002"  
"14000003"  
"14000004"  
      81,091,372 bytes allocated in the heap  
     143,799,376 bytes copied during GC  
      44,653,636 bytes maximum residency (3 sample(s))  
       1,005,516 bytes maximum slop  
              79 MB total memory in use (0 MB lost due to fragmentation)  
  Generation 0:   112 collections,     0 parallel,  0.54s,  0.59s elapsed  
  Generation 1:     3 collections,     0 parallel,  0.41s,  0.45s elapsed  
  INIT  time    0.00s  (  0.03s elapsed)  
  MUT   time    0.12s  (  0.18s elapsed)  
  GC    time    0.95s  (  1.04s elapsed)  
  EXIT  time    0.00s  (  0.06s elapsed)  
  Total time    1.06s  (  1.24s elapsed)  
  %GC time      89.1%  (83.3% elapsed)  
  Alloc rate    699,015,343 bytes per MUT second  
  Productivity  10.9% of total user, 9.3% of total elapsed  
有人可以解释为什么严格标志似乎导致程序使用如此多的内存?这个简单的例子来自于我试图理解为什么我的程序在读取500万行的大文件和创建记录向量时使用如此多的内存.
这里的问题主要是,你正在使用的String(这是一个别名[Char])类型,由于其作为单一的非严格的列表表示Chars需要每字5个字的内存堆(另见本博客文章中对一些内存占用比较)
在懒惰的情况下,你基本上存储一个指向(共享)评估函数show . (+10000000)和一个变量整数的未评估thunk ,而在严格的情况下,由8个字符组成的完整字符串似乎已实现(通常爆炸模式只会强制最外层的list-constructor :,即String要评估的lazy的第一个字符,它需要更多的堆空间,字符串变得越长.
因此,存储String长度为8的5000000 个字符串需要5000000*8*5 = 200000000个字,其在32位上对应于约~763MiB.如果Char数字是共享的,你只需要3/5,即~458 MiB,这似乎与你观察到的内存开销相匹配.
如果你用String更紧凑的东西替换你的东西,Data.ByteString.ByteString你会注意到与使用平原相比,内存开销会低一个数量级String.