为什么haskell表现比java差

eps*_*lbe 5 java performance haskell

我想用随机数来愚弄,如果haskell中的随机生成器是否均匀分布,那么我在几次尝试之后写了下面的程序(生成列表导致堆栈溢出).

module Main where

import System.Environment (getArgs)
import Control.Applicative ((<$>))
import System.Random (randomRIO)

main :: IO ()
main = do nn <- map read <$> getArgs :: IO [Int]
          let n = if null nn then 10000 else head nn
          m <- loop n 0 (randomRIO (0,1))
          putStrLn $ "True: "++show (m//n::Double) ++", False "++show ((n-m)//n :: Double)
          return ()

loop :: Int -> Int -> IO Double -> IO Int
loop n acc x | n<0       = return acc
             | otherwise = do x' <- round <$> x
                              let x'' = (x' + acc) in x'' `seq` loop (n-1) x'' x

(//) :: (Integral a, Fractional b) => a -> a -> b
x // y = fromIntegral x / fromIntegral y
Run Code Online (Sandbox Code Playgroud)

当我得到它工作ok-ish我决定写另一个版本 - 在java(我不是很擅长),并期望haskell击败它,但java程序运行大约一半的时间相比haskell版本

import java.util.Random;

public class MonteCarlo {
    public static void main(String[] args) {
        int n = Integer.parseInt(args[0]);
        Random r = new Random();
        int acc = 0;
        for (int i=0; i<=n; i++) {
            acc += Math.round(r.nextDouble());
        }
        System.out.println("True: "+(double) acc/n+", False: "+(double)(n-acc)/n);
    }
}
Run Code Online (Sandbox Code Playgroud)

我试着看看haskell版本的配置文件 - 它告诉我大部分工作都是在循环中完成的 - 难怪!我试着看看核心,但我真的不明白这一点.我认为java版本可能使用多个核心 - 因为系统使用时间超过100%,当我计时时.

我想可以使用未装箱的双打/ Ints来改进代码,但是我对hakell的了解还不止于此.

Mih*_*eac 10

我已经尝试了依赖于懒惰的代码的原始版本:

module Main where

import System.Environment
import Control.Applicative
import System.Random

main :: IO ()
main = do
  args <- getArgs
  let n = if null args then 10000 else read $ head args
  g <- getStdGen
  let vals = randomRs (0, 1) g :: [Int]
  let s = sum $ take n vals
  putStrLn $ "True: " ++ f s n ++ ", False" ++ f (n - s) n

f x y = show $ ((fromIntegral x / fromIntegral y) :: Double)
Run Code Online (Sandbox Code Playgroud)

现在,忽略我错过了一些类型声明的事实,我从模块中导入了所有内容.我只是想自由地测试.

回到城堡后,您的版本被保存为original.hs上述内容保存为1.hs.测试时间:

[mihai@esgaroth so]$ ghc --make -O2 original.hs
[1 of 1] Compiling Main             ( original.hs, original.o )
Linking original ...
[mihai@esgaroth so]$ ghc --make -O2 1.hs 
[1 of 1] Compiling Main             ( 1.hs, 1.o )
Linking 1 ...
[mihai@esgaroth so]$ time ./original 
True: 0.4981, False 0.5019

real    0m0.022s
user    0m0.021s
sys     0m0.000s
[mihai@esgaroth so]$ time ./1 
True: 0.4934, False0.5066

real    0m0.005s
user    0m0.003s
sys     0m0.001s
[mihai@esgaroth so]$ time ./original 
True: 0.5063, False 0.4937

real    0m0.018s
user    0m0.017s
sys     0m0.001s
[mihai@esgaroth so]$ time ./1 
True: 0.5024, False0.4976

real    0m0.005s
user    0m0.003s
sys     0m0.002s
Run Code Online (Sandbox Code Playgroud)

每次,新代码都快了4倍.而这仍然是使用惰性结构和已有代码的第一个版本.

下一步是测试性能堆,并在生成随机列表时查看是否值得嵌入和计算.

PS:在我的机器上:

[mihai@esgaroth so]$ time java MonteCarlo 10000
True: 0.5011, False: 0.4989

real    0m0.063s
user    0m0.066s
sys     0m0.010s
Run Code Online (Sandbox Code Playgroud)

PPS:运行编译的代码,不需要-O2:

[mihai@esgaroth so]$ time ./original 
True: 0.5035, False 0.4965

real    0m0.032s
user    0m0.031s
sys     0m0.001s
[mihai@esgaroth so]$ time ./1 
True: 0.4975, False0.5025

real    0m0.014s
user    0m0.010s
sys     0m0.003s
Run Code Online (Sandbox Code Playgroud)

只减少了2倍,但仍然比java快.

  • 以下是各种随机数生成器的基准测试:http://www.serpentine.com/blog/2009/09/19/a-new-pseudo-random-number-generator-for-haskell/请注意使用`System.随机生成随机`Int64`的速度可以是随机`Double`的两倍 - 但其他库的效率要高得多. (3认同)

eps*_*lbe 0

正如评论所暗示的那样,随机数生成应该是罪魁祸首。我做了一些实验,发现确实是这样。

@Mihai_Maruseac 的示例比较了不同的解决方案,生成Ints 而不是Double值,这恰好更快一些。但在我的计算机(Debian7、6Gib Ram、core i5)上,java 版本仍然更快。

这是我的 haskell 文件,比较了不同的解决方案,顺便说一句,前四个循环被注释掉,因为生成的样本数量正在产生堆栈溢出。

module Main where


import Control.Applicative ((<$>))
import Control.Monad (replicateM)
import Data.List (foldl')
import Control.Monad.Primitive (PrimMonad, PrimState)
import Data.Vector.Generic (Vector)
import qualified Data.Vector.Unboxed as V
import qualified Data.Vector.Generic as G

import System.Random.MWC
import System.Random (randomRIO
                     ,getStdGen
                     ,randomRs
                     ,randomR
                     ,RandomGen)

import Criterion.Main

main :: IO ()
main = do let n = 1000000 ::Int -- 10^6
          defaultMain [
                    -- bench "loop1" $ whnfIO (loop1 n),
                    -- bench "loop2" $ whnfIO (loop2 n),
                    -- bench "loop3" $ whnfIO (loop3 n),
                    -- bench "loop4" $ whnfIO (loop4 n),
                       bench "loop5" $ whnfIO (loop5 n),
                       bench "loop6" $ whnfIO (loop6 n),
                       bench "loop7" $ whnfIO (loop7 n),
                       bench "moop5" $ whnfIO (moop5 n),
                       bench "moop6" $ whnfIO (moop6 n)]

loop1 ::  Int -> IO Int
loop1 n = do rs <- replicateM n (randomRIO (0,1)) :: IO [Double]
             return . length . filter (==True) $ map (<=0.5) rs

loop2 ::  Int -> IO Int
loop2 n = do rs <- replicateM n (randomRIO (0,1)) :: IO [Double]
             return $ foldl' (\x y -> x + round y) 0 rs


loop3 ::  Int -> IO Int
loop3 n = loop n 0 (randomRIO (0,1))
        where loop :: Int -> Int -> IO Double -> IO Int
              loop n' acc x | n' <=0    = round <$> x
                            | otherwise = do x' <- x
                                             loop (n'-1) (round x' + acc) x

loop4 ::  Int -> IO Int
loop4 n = loop n 0 (randomRIO (0,1))
        where loop :: Int -> Int -> IO Double -> IO Int
              loop n' acc x | n'<0     = return acc
                           | otherwise = do x' <- round <$> x
                                            let x'' = (x' + acc) in x'' `seq` loop (n'-1) x'' x

loop5 ::  Int -> IO Int
loop5 n = do g <- getStdGen
             return . sum . take n $ randomRs (0,1::Int) g

loop6 ::  Int -> IO Int
loop6 n = do g <- getStdGen
             return $ loop 0 n g
        where loop ::  (RandomGen t) => Int -> Int -> t -> Int
              loop acc n' g | n'<0       = acc
                            | otherwise = let (m,g') = randomR (0,1::Int) g
                                              acc' = acc + m
                                          in acc' `seq` loop acc' (n'-1) g'

loop7 ::  Int -> IO Int
loop7 n = do g <- getStdGen
             return $ loop 0 n g
        where loop ::  (RandomGen t) => Int -> Int -> t -> Int
              loop acc n' g | n'<0       = acc
                            | otherwise = let (m,g') = randomR (0,1::Double) g
                                              acc' = acc + round m
                                          in acc' `seq` loop acc' (n'-1) g'

moop5 :: Int -> IO Int
moop5 n = do vs <- withSystemRandom . asGenST $ \gen -> uniformVectorR (0,1) gen n
             return . V.sum $ V.map round (vs :: V.Vector Double)


moop6 :: Int -> IO Int
moop6 n = do vs <- withSystemRandom . asGenST $ \gen -> uniformVectorR (0,1) gen n
             return $ V.sum  (vs :: V.Vector Int)

-- Helper functions ------------------------------------------------------------

report :: Int -> Int -> String -> IO ()
report n m msg = putStrLn $ msg ++ "\n" ++
                           "True: "  ++ show (    m//n :: Double) ++ ", "++
                           "False: " ++ show ((n-m)//n :: Double)

(//) :: (Integral a, Fractional b) => a -> a -> b
x // y = fromIntegral x / fromIntegral y

uniformVectorR :: (PrimMonad m, Variate a, Vector v a) =>
                  (a, a) -> Gen (PrimState m) -> Int -> m (v a)
uniformVectorR (lo,hi) gen n = G.replicateM n (uniformR (lo,hi) gen)
{-# INLINE uniformVectorR #-}
Run Code Online (Sandbox Code Playgroud)