类似的代码(对于指数加权偏差)在Haskell中比在Python中慢

Ily*_*kin 9 python performance haskell weighted-average

我在python3和Haskell(编译)中实现了指数加权移动平均值(ewma).这需要大约相同的时间.但是当这个函数被应用两次时,haskell版本会无法预测地减慢速度(超过1000次,而python版本仅慢2倍).

Python3版本:

import numpy as np
def ewma_f(y, tau):
    a = 1/tau
    avg = np.zeros_like(y)
    for i in range(1, len(y)):
        avg[i] = a*y[i-1]+(1-a)*avg[i-1]
    return avg
Run Code Online (Sandbox Code Playgroud)

Haskell列表:

ewmaL :: [Double] -> Double -> [Double]
ewmaL ys tau = reverse $ e (reverse ys) (1.0/tau)
    where e [x] a    = [a*x]
          e (x:xs) a = (a*x + (1-a)*(head $ e xs a) : e xs a)
Run Code Online (Sandbox Code Playgroud)

Haskell与数组:

import qualified Data.Vector as V
ewmaV :: V.Vector Double -> Double -> V.Vector Double
ewmaV x tau = V.map f $ V.enumFromN 0 (V.length x)
    where
        f (-1) = 0
        f n    = (x V.! n)*a + (1-a)*(f (n-1))
        a = 1/tau
Run Code Online (Sandbox Code Playgroud)

在所有情况下,计算大约需要相同的时间(针对具有10000个元素的数组进行测试).Haskell代码编译时没有任何标志,但"ghc -O2"没有任何区别.

我用计算的ewma来计算这个ewma的绝对偏差; 然后我将ewma函数应用于此偏差.

Python3:

def ewmd_f(y, tau):
    ewma = ewma_f(y, tau)
    return ewma_f(np.abs(y-ewma), tau)
Run Code Online (Sandbox Code Playgroud)

与ewma相比,它的运行时间延长了两倍.

Haskell列表:

ewmdL :: [Double] -> Double -> [Double]
ewmdL xs tau = ewmaL devs tau
    where devs = zipWith (\ x y -> abs $ x-y) xs avg
          avg  = (ewmaL xs tau)
Run Code Online (Sandbox Code Playgroud)

Haskell与向量:

ewmdV :: V.Vector Double -> Double -> V.Vector Double
ewmdV xs tau = ewmaV devs tau
    where devs = V.zipWith (\ x y -> abs $ x-y) xs avg
          avg  = ewmaV xs tau
Run Code Online (Sandbox Code Playgroud)

两个ewmd比他们的ewma对手慢了1000.

我评估了python3代码:

from time import time
x = np.sin(np.arange(10000))
tau = 100.0

t1 = time()
ewma = ewma_f(x, tau)
t2 = time()
ewmd = ewmd_f(x, tau)
t3 = time()

print("EWMA took {} s".format(t2-t1))
print("EWMD took {} s".format(t3-t2))
Run Code Online (Sandbox Code Playgroud)

我用以下方法评估了Haskell代码:

import System.CPUTime

timeIt f = do
    start <- getCPUTime
    end   <- seq f getCPUTime
    let d = (fromIntegral (end - start)) / (10^12) in
        return (show d)

main = do
    let n = 10000 :: Int
    let tau = 100.0
    let l = map sin [0.0..(fromIntegral $ n-1)]
    let x = V.map sin $ V.enumFromN 0 n

    putStrLn "Vectors"
    aV <- timeIt $ V.last $ ewmaV x tau
    putStrLn $ "EWMA (vector) took "++aV
    dV <- timeIt $ V.last $ ewmdV x tau
    putStrLn $ "EWMD (vector) took "++dV
    putStrLn ""
    putStrLn "Lists"
    lV <- timeIt $ last $ ewmaL l tau
    putStrLn $ "EWMA (list) took "++lV
    lD <- timeIt $ last $ ewmdL l tau
    putStrLn $ "EWMD (list) took "++lD
Run Code Online (Sandbox Code Playgroud)

lef*_*out 12

您的Python和Haskell算法可能看起来相同,但它们实际上具有不同的渐近复杂度:

ewmaV x tau = V.map f $ V.enumFromN 0 (V.length x)
    where
        f (-1) = 0
        f n    = (x V.! n)*a + (1-a)
                     *(f (n-1)) -- Recursion!
        a = 1/tau
Run Code Online (Sandbox Code Playgroud)

这使得Haskell的实现Ø(ñ ²),这是不可接受的.你只是在评估时没有注意到这一点的原因V.last . ewmaV是懒惰:只评估最后一个元素,你真的不需要处理整个向量,而只需要一个递归循环x.OTOH,ewmdV实际上强制所有元素,因此额外的成本.

一个简单的(但不是最优的,我敢说)解决这个问题的方法是记住结果:

ewmaV :: V.Vector Double -> Double -> V.Vector Double
ewmaV x tau = result
  where result = V.map f $ V.enumFromN 0 (V.length x)
        f 0 = V.head x * a
        f n    = (x V.! n)*a + (1-a)*(result V.! (n-1))
        a = 1/tau
Run Code Online (Sandbox Code Playgroud)

现在ewmdV需要≈twice,只要ewmaV:

sagemuej@sagemuej-X302LA:/tmp$ ghc wtmpf-file6122.hs -O2 && ./wtmpf-file6122
[1 of 1] Compiling Main             ( wtmpf-file6122.hs, wtmpf-file6122.o )
Linking wtmpf-file6122 ...
Vectors
EWMA (vector) took 4.932e-3
EWMD (vector) took 7.758e-3
Run Code Online (Sandbox Code Playgroud)

(这些时间不是很可靠;对于准确的性能测试,使用标准.)


更好的解决方案IMO将完全避免这种索引业务 - 我们不是在编写Fortran,是吗?像EWMA这样的IIR更好地以纯粹的"本地"方式实施; 这可以在带有状态monad的Haskell中很好地表达,因此您可以独立于数据所在的容器.

import Data.Traversable
import Control.Monad (forM)
import Control.Monad.State

ewma :: Traversable t => t Double -> Double -> t Double
ewma x tau = (`evalState`0) . forM x $
         \xi -> state $ \carry
            -> let yi = a*xi + (1-a)*carry
               in (yi, yi)
 where a = 1/tau
Run Code Online (Sandbox Code Playgroud)

虽然我们正在进行推广:没有理由将此限制仅用于处理Double数据; 您可以过滤任何可以缩放和插值的变量.

{-# LANGUAGE FlexibleContexts #-}
import Data.VectorSpace

ewma :: (Traversable t, VectorSpace v, Fractional (Scalar v))
             => t v -> Scalar v -> t v
ewma x tau = (`evalState`zeroV) . forM x $
         \xi -> state $ \carry
            -> let yi = a*^xi ^+^ (1-a)*^carry
               in (yi, yi)
 where a = 1/tau
Run Code Online (Sandbox Code Playgroud)

这样,原则上可以使用相同的滤波器来对存储在延迟流式无限图像帧列表中的运动模糊视频数据进行低通滤波,以对存储在未装箱中的无线电信号脉冲进行低通滤波Vector.(VU.Vector实际上没有Traversable实例;你需要替换它 oforM.)