jto*_*bin 5 random simulation garbage-collection haskell ghc
我在查看Statemonad中运行的模拟中如何减少内存使用和GC时间时遇到了一些麻烦.目前我必须运行已编译的代码+RTS -K100M以避免堆栈空间溢出,并且GC统计数据非常可怕(见下文).
以下是代码的相关摘要.完整的,有效的(GHC 7.4.1)代码可以在http://hpaste.org/68527找到.
-- Lone algebraic data type holding the simulation configuration.
data SimConfig = SimConfig {
numDimensions :: !Int -- strict
, numWalkers :: !Int -- strict
, simArray :: IntMap [Double] -- strict spine
, logP :: Seq Double -- strict spine
, logL :: Seq Double -- strict spine
, pairStream :: [(Int, Int)] -- lazy (infinite) list of random vals
, doubleStream :: [Double] -- lazy (infinite) list of random vals
} deriving Show
-- The transition kernel for the simulation.
simKernel :: State SimConfig ()
simKernel = do
config <- get
let arr = simArray config
let n = numWalkers config
let d = numDimensions config
let rstm0 = pairStream config
let rstm1 = doubleStream config
let lp = logP config
let ll = logL config
let (a, b) = head rstm0 -- uses random stream
let z0 = head . map affineTransform $ take 1 rstm1 -- uses random stream
where affineTransform a = 0.5 * (a + 1) ^ 2
let proposal = zipWith (+) r1 r2
where r1 = map (*z0) $ fromJust (IntMap.lookup a arr)
r2 = map (*(1-z0)) $ fromJust (IntMap.lookup b arr)
let logA = if val > 0 then 0 else val
where val = logP_proposal + logL_proposal - (lp `index` (a - 1)) - (ll `index` (a - 1)) + ((fromIntegral n - 1) * log z0)
logP_proposal = logPrior proposal
logL_proposal = logLikelihood proposal
let cVal = (rstm1 !! 1) <= exp logA -- uses random stream
let newConfig = SimConfig { simArray = if cVal
then IntMap.update (\_ -> Just proposal) a arr
else arr
, numWalkers = n
, numDimensions = d
, pairStream = drop 1 rstm0
, doubleStream = drop 2 rstm1
, logP = if cVal
then Seq.update (a - 1) (logPrior proposal) lp
else lp
, logL = if cVal
then Seq.update (a - 1) (logLikelihood proposal) ll
else ll
}
put newConfig
main = do
-- (some stuff omitted)
let sim = logL $ (`execState` initConfig) . replicateM 100000 $ simKernel
print sim
Run Code Online (Sandbox Code Playgroud)
就堆而言,配置文件似乎提示System.Random除了(,)内存之外,这些函数都是内存的罪魁祸首.我无法直接包含图像,但您可以在此处查看堆配置文件:http://i.imgur.com/5LKxX.png.
我不知道如何进一步减少这些东西的存在.随机变量是在Statemonad 之外生成的(为了避免在每次迭代时拆分生成器),我相信当从模拟配置中包含的惰性列表()中提取一对时,会出现唯一的(,)内部实例.simKernelpairStream
包括GC在内的统计数据如下:
1,220,911,360 bytes allocated in the heap
787,192,920 bytes copied during GC
186,821,752 bytes maximum residency (10 sample(s))
1,030,400 bytes maximum slop
449 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 2159 colls, 0 par 0.80s 0.81s 0.0004s 0.0283s
Gen 1 10 colls, 0 par 0.96s 1.09s 0.1094s 0.4354s
INIT time 0.00s ( 0.00s elapsed)
MUT time 0.95s ( 0.97s elapsed)
GC time 1.76s ( 1.91s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 2.72s ( 2.88s elapsed)
%GC time 64.9% (66.2% elapsed)
Alloc rate 1,278,074,521 bytes per MUT second
Productivity 35.1% of total user, 33.1% of total elapsed
Run Code Online (Sandbox Code Playgroud)
而且,我必须提高最大堆栈大小才能运行模拟.我知道在某个地方肯定会有一个大笨蛋......但我无法弄清楚在哪里?
如何在这样的问题中改进堆/堆栈分配和GC?我怎样才能确定thunk可能在哪里积聚?State这里monad 的使用是否被误导?
-
更新:
在编译时,我忽略了查看分析器的输出-fprof-auto.这是输出的头部:
COST CENTRE MODULE no. entries %time %alloc %time %alloc
MAIN MAIN 58 0 0.0 0.0 100.0 100.0
main Main 117 0 0.0 0.0 100.0 100.0
main.randomList Main 147 1 62.0 55.5 62.0 55.5
main.arr Main 142 1 0.0 0.0 0.0 0.0
streamToAssocList Main 143 1 0.0 0.0 0.0 0.0
streamToAssocList.go Main 146 5 0.0 0.0 0.0 0.0
main.pairList Main 137 1 0.0 0.0 9.5 16.5
consPairStream Main 138 1 0.7 0.9 9.5 16.5
consPairStream.ys Main 140 1 4.3 7.8 4.3 7.8
consPairStream.xs Main 139 1 4.5 7.8 4.5 7.8
main.initConfig Main 122 1 0.0 0.0 0.0 0.0
logLikelihood Main 163 0 0.0 0.0 0.0 0.0
logPrior Main 161 5 0.0 0.0 0.0 0.0
main.sim Main 118 1 1.0 2.2 28.6 28.1
simKernel Main 120 0 4.8 5.1 27.6 25.8
Run Code Online (Sandbox Code Playgroud)
我不确定如何准确地解释这一点,但是懒惰的随机双打流randomList让我畏缩.我不知道如何改进.
我已经用一个工作示例更新了 hpaste。看起来罪魁祸首是:
SimConfig三个字段中缺少严格性注释: simArray、logP和logL 数据 SimConfig = SimConfig {
numDimensions :: !Int -- 严格
, numWalkers :: !Int -- 严格
, simArray :: !(IntMap [Double]) -- 严格脊柱
, logP :: !(Seq Double) -- 严格脊柱
, logL :: !(Seq Double) -- 严格脊柱
,pairStream :: [(Int, Int)] -- 懒惰
, doubleStream :: [Double] -- 懒惰
} 导出显示
newConfigsimKernel由于State懒惰,从未在循环中进行评估。另一种选择是使用严格的Statemonad。
put $! newConfig
Run Code Online (Sandbox Code Playgroud)execState ... replicateM还构建了 thunks。我最初用 a 替换它foldl'并将其移到execState折叠中,但我认为交换replicateM_是等效的并且更容易阅读:
let sim = logL $ execState (replicateM_ epochs simKernel) initConfig
-- sim = logL $ foldl' (const . execState simKernel) initConfig [1..epochs]
Run Code Online (Sandbox Code Playgroud)一些调用已mapM .. replicate被替换为replicateM. 特别值得注意的是consPairList它大大减少了内存使用量。仍有改进的空间,但最容易实现的目标涉及 unsafeInterleaveST...所以我停了下来。
不知道输出结果是否是你想要的:
fromList [-4.287033457733427,-1.8000404912760795,-5.581988678626085,-0.9362372340483293,-5.267791907985331]
但以下是统计数据:
堆中分配了 268,004,448 字节
GC 期间复制了 70,753,952 字节
16,014,224 字节最大驻留(7 个样本)
最大斜率 1,372,456 字节
使用中的总内存为 40 MB(由于碎片而丢失 0 MB)
总时间(已过去) 平均暂停 最大暂停
Gen 0 490 项,0 杆 0.05s 0.05s 0.0001s 0.0012s
第 1 代 7 项,0 杆 0.04s 0.05s 0.0076s 0.0209s
INIT 时间 0.00s(经过 0.00s)
MUT时间0.12s(经过0.12s)
GC时间0.09秒(经过0.10秒)
退出时间 0.00s(经过 0.00s)
总时间 0.21 秒(经过 0.22 秒)
%GC 时间 42.2%(45.1% 已过去)
分配速率 2,241,514,569 字节/MUT 秒
生产力 占总用户的 57.8%,占总消耗量的 53.7%