在Haskell中使用动态编程?[警告:ProjectEuler 31解决方案内部]

gnu*_*nce 6 ocaml haskell dynamic-programming

在解决projecteuler.net的问题#31 [ SPOILERS AHEAD ](计算用英镑硬币赚2英镑的方式)时,我想使用动态编程.我从OCaml开始,编写了简短而有效的编程:

open Num

let make_dyn_table amount coins =
  let t = Array.make_matrix (Array.length coins) (amount+1) (Int 1) in
  for i = 1 to (Array.length t) - 1 do
    for j = 0 to amount do
      if j < coins.(i) then
        t.(i).(j) <- t.(i-1).(j)
      else
        t.(i).(j) <- t.(i-1).(j) +/ t.(i).(j - coins.(i))
    done
  done;
  t

let _ =
  let t = make_dyn_table 200 [|1;2;5;10;20;50;100;200|] in
  let last_row = Array.length t - 1 in
  let last_col = Array.length t.(last_row) - 1 in
  Printf.printf "%s\n" (string_of_num (t.(last_row).(last_col)))
Run Code Online (Sandbox Code Playgroud)

这在我的笔记本电脑上执行约8毫秒.如果我将数量从200便士增加到一百万,程序仍然会在不到两秒的时间内找到答案.

我将程序翻译成Haskell(本身绝对不是很有趣),虽然它以200便士的正确答案终止,但如果我将这个数字增加到10000,我的笔记本电脑就会嘎然而止(大量的捶打).这是代码:

import Data.Array

createDynTable :: Int -> Array Int Int -> Array (Int, Int) Int
createDynTable amount coins =
    let numCoins = (snd . bounds) coins
        t = array ((0, 0), (numCoins, amount))
            [((i, j), 1) | i <- [0 .. numCoins], j <- [0 .. amount]]
    in t

populateDynTable :: Array (Int, Int) Int -> Array Int Int -> Array (Int, Int) Int
populateDynTable t coins =
    go t 1 0
        where go t i j
                 | i > maxX = t
                 | j > maxY = go t (i+1) 0
                 | j < coins ! i = go (t // [((i, j), t ! (i-1, j))]) i (j+1)
                 | otherwise = go (t // [((i, j), t!(i-1,j) + t!(i, j - coins!i))]) i (j+1)
              ((_, _), (maxX, maxY)) = bounds t

changeCombinations amount coins =
    let coinsArray = listArray (0, length coins - 1) coins
        dynTable = createDynTable amount coinsArray
        dynTable' = populateDynTable dynTable coinsArray
        ((_, _), (i, j)) = bounds dynTable
    in
      dynTable' ! (i, j)

main =
    print $ changeCombinations 200 [1,2,5,10,20,50,100,200]
Run Code Online (Sandbox Code Playgroud)

我很想听听那些了解Haskell的人为什么这个解决方案的性能如此糟糕.

Dan*_*her 11

Haskell很纯粹.纯度意味着值是不可变的,因此在步骤中

j < coins ! i = go (t // [((i, j), t ! (i-1, j))]) i (j+1)
Run Code Online (Sandbox Code Playgroud)

您为每个更新的条目创建一个完整的新数组.对于像2英镑这样的少量而言,这已经非常昂贵了,但是对于100英镑的金额来说它变得非常淫秽.

此外,数组是盒装的,这意味着它们包含指向条目的指针,这会恶化局部性,使用更多的存储空间,并允许构建thunks,这些thunk在最终被强制时评估的速度也较慢.

所使用的算法依赖于可变数据结构的效率,但可变性仅限于计算,因此我们可以使用旨在允许使用临时可变数据安全屏蔽计算,ST状态变换器monad系列以及相关联的[unboxed] ,为了效率]数组.

给我半个小时左右的时间将算法转换为使用STUArrays的代码,你会得到一个不太难看的Haskell版本,并且应该与O'Caml版本相比(预计会有一些或多或少的常数因素)对于差异,无论是大于还是小于1,我都不知道.

这里是:

module Main (main) where

import System.Environment (getArgs)

import Data.Array.ST
import Control.Monad.ST
import Data.Array.Unboxed

standardCoins :: [Int]
standardCoins = [1,2,5,10,20,50,100,200]

changeCombinations :: Int -> [Int] -> Int
changeCombinations amount coins = runST $ do
    let coinBound = length coins - 1
        coinsArray :: UArray Int Int
        coinsArray = listArray (0, coinBound) coins
    table <- newArray((0,0),(coinBound, amount)) 1 :: ST s (STUArray s (Int,Int) Int)
    let go i j
            | i > coinBound = readArray table (coinBound,amount)
            | j > amount   = go (i+1) 0
            | j < coinsArray ! i = do
                v <- readArray table (i-1,j)
                writeArray table (i,j) v
                go i (j+1)
            | otherwise = do
                v <- readArray table (i-1,j)
                w <- readArray table (i, j - coinsArray!i)
                writeArray table (i,j) (v+w)
                go i (j+1)
    go 1 0

main :: IO ()
main = do
    args <- getArgs
    let amount = case args of
                   a:_ -> read a
                   _   -> 200
    print $ changeCombinations amount standardCoins
Run Code Online (Sandbox Code Playgroud)

在不太破旧的时间里奔跑,

$ time ./mutArr
73682

real    0m0.002s
user    0m0.000s
sys     0m0.001s
$ time ./mutArr 1000000
986687212143813985

real    0m0.439s
user    0m0.128s
sys     0m0.310s
Run Code Online (Sandbox Code Playgroud)

并使用已检查的数组访问,使用未经检查的访问,时间可能会有所减少.


啊,我刚刚得知你的O'Caml代码使用任意精度整数,因此Int在Haskell中使用会使O'Caml处于不公平的劣势.以任意精度Integers 计算结果所需的变化是最小的,

$ diff mutArr.hs mutArrIgr.hs
12c12
< changeCombinations :: Int -> [Int] -> Int
---
> changeCombinations :: Int -> [Int] -> Integer
17c17
<     table <- newArray((0,0),(coinBound, amount)) 1 :: ST s (STUArray s (Int,Int) Int)
---
>     table <- newArray((0,0),(coinBound, amount)) 1 :: ST s (STArray s (Int,Int) Integer)
28c28
<                 writeArray table (i,j) (v+w)
---
>                 writeArray table (i,j) $! (v+w)
Run Code Online (Sandbox Code Playgroud)

只需要调整两种类型的签名 - 数组必然会被装箱,所以我们需要确保我们不会在第28行向阵列写入thunks,并且

$ time ./mutArrIgr 
73682

real    0m0.002s
user    0m0.000s
sys     0m0.002s
$ time ./mutArrIgr 1000000
99341140660285639188927260001

real    0m1.314s
user    0m1.157s
sys     0m0.156s
Run Code Online (Sandbox Code Playgroud)

对于Ints 溢出的大结果的计算明显更长,但正如预期的那样与O'Caml相当.


花一些时间了解O'Caml,我可以提供更接近,更短,并且可以说是更好的翻译:

module Main (main) where

import System.Environment (getArgs)

import Data.Array.ST
import Control.Monad.ST
import Data.Array.Unboxed
import Control.Monad (forM_)

standardCoins :: [Int]
standardCoins = [1,2,5,10,20,50,100,200]

changeCombinations :: Int -> [Int] -> Integer
changeCombinations amount coins = runST $ do
    let coinBound = length coins - 1
        coinsArray :: UArray Int Int
        coinsArray = listArray (0, coinBound) coins
    table <- newArray((0,0),(coinBound, amount)) 1 :: ST s (STArray s (Int,Int) Integer)
    forM_ [1 .. coinBound] $ \i ->
        forM_ [0 .. amount] $ \j ->
            if j < coinsArray!i
              then do
                  v <- readArray table (i-1,j)
                  writeArray table (i,j) v
              else do
                v <- readArray table (i-1,j)
                w <- readArray table (i, j - coinsArray!i)
                writeArray table (i,j) $! (v+w)
    readArray table (coinBound,amount)

main :: IO ()
main = do
    args <- getArgs
    let amount = case args of
                   a:_ -> read a
                   _   -> 200
    print $ changeCombinations amount standardCoins
Run Code Online (Sandbox Code Playgroud)

运行速度相同:

$ time ./mutArrIgrM 1000000
99341140660285639188927260001

real    0m1.440s
user    0m1.273s
sys     0m0.164s
Run Code Online (Sandbox Code Playgroud)