对 R、Python 和 Julia 中的简单水文学模块进行基准测试

Ari*_*hie 8 python benchmarking r julia

我最近开始与一个团队合作,该团队的成员不同程度地喜欢 Python、R 和 Julia(按受欢迎程度排序)。我们正在将由不同团队成员构建的一组水文学模块翻译成一个公共库,我想在就我们应该为此选择哪种语言达成共识之前做一些基本的基准测试。问题是我对 Python 的了解很差,而且对 Julia 还很陌生,而且我相信由于糟糕的编码实践,我可能会导致结果出现偏差。

这些模块(代码如下)的结果是:Python(93 毫秒)、R(55 毫秒)和 Julia(0.5 毫秒)。代码无法矢量化(根据我的理解),所以我假设 Julia 最快,但不是 100 倍,而且我还假设 Python 会比 R 更快。

谁能指出效率方面的一些基本错误?

Python:

import numpy as np
def SM_routine_f(infiltration=None, PET=None, rcScl=0.1, rcExp=1.3, PETexp=2, Zr=1000, n=0.5, smhp=0.00, smwp=0.10, smfc=0.25, s0=0.5):
    nt = len(infiltration)
    SM_store = np.empty(nt+1)
    SM_store[0] = s0
    smhp_stor = Zr * smhp
    smwp_stor = Zr * smwp
    smfc_stor = Zr * smfc
    max_stor = Zr * n
    for i in range(nt):
        thisSMstor = SM_store[i]
        AET = np.where(thisSMstor > smhp_stor, (thisSMstor - smhp_stor) * PET[i] * (thisSMstor / max_stor) ** PETexp / max_stor, 0)
        thisSMstor -= AET
        deepDrainage = np.where(thisSMstor > smfc_stor, rcScl * thisSMstor * (thisSMstor / smfc_stor) ** rcExp - (thisSMstor - smfc_stor), 0)
        SM_store[i+1] = min(max_stor, thisSMstor + infiltration[i] - deepDrainage)
    return SM_store / n
exInfil = np.random.normal(loc=1, scale=1, size=365*20)
exPET = np.random.normal(loc=2, scale=1, size=365*20)
import timeit
timeit.timeit(lambda: SM_routine_f(infiltration=exInfil, PET=exPET), number=1)
Run Code Online (Sandbox Code Playgroud)

回复:

SM_routine_f = function(infiltration = NA, PET = NA, 
    rcScl = 0.1,    # water retention curve scalar
    rcExp = 1.3,    # water retention curve exponent
    PETexp = 2,     # exponent of PET decay
    Zr = 1000,      # root depth
    n = 0.5,        # soil porosity
    smhp = 0.00,    # soil moisture at hydroscopic point
    smwp = 0.10,    # soil moisture at wilting point
    smfc = 0.25,    # soil moisture field capacity
    s0 = .5)    # initial soil moisture 
    {
    nt = length(infiltration)
    SM_store = c(s0, rep(NA, nt))
    smhp_stor = Zr * smhp
    smwp_stor = Zr * smwp
    smfc_stor = Zr * smfc
    max_stor = Zr * n
    
    for(i in 1:length(infiltration))    {
        thisSMstor = SM_store[i]
        AET = ifelse(thisSMstor > smhp_stor, 
            min(thisSMstor - smhp_stor, PET[i] * (thisSMstor / max_stor) ^ PETexp),
            0)
        thisSMstor = thisSMstor - AET
        
        deepDrainage = ifelse(thisSMstor > smfc_stor,
            min(thisSMstor - smfc_stor, rcScl * thisSMstor * (thisSMstor / smfc_stor)^rcExp),
            0)
        
        SM_store[i+1] = min(max_stor, thisSMstor - min(c(thisSMstor, deepDrainage)) + infiltration[i]) 
        }
    return(SM_store / n)
    }

    # generating dummy data for testing, e.g. 20 years of climate data
exInfil = rnorm(365*20, mean=1, sd=1)
exPET = rnorm(365*20, mean=2, sd=1)

    # benchmarking
microbenchmark::microbenchmark(SM_routine_f(infiltration=exInfil, PET = exPET))
Run Code Online (Sandbox Code Playgroud)

朱莉娅:

function SM_routine_f(infiltration::Vector{Float64}, PET::Vector{Float64})

rcScl = 0.1
rcExp = 1.3
PETexp = 2
Zr = 1000.0
n = 0.5
smhp = 0.00
smwp = 0.10
smfc = 0.25
s0 = 0.5
nt = length(infiltration)
SM_store = [s0; fill(missing,nt)]
smhp_stor = Zr * smhp
smwp_stor = Zr * smwp
smfc_stor = Zr * smfc
max_stor = Zr * n

for i in 1:length(infiltration)
    thisSMstor = SM_store[i]
    AET = (thisSMstor > smhp_stor) ? min(thisSMstor - smhp_stor, PET[i] * (thisSMstor / max_stor) ^ PETexp) : 0
    thisSMstor = thisSMstor - AET
    deepDrainage = (thisSMstor > smfc_stor) ?  min(thisSMstor - smfc_stor, rcScl * thisSMstor * (thisSMstor / smfc_stor)^rcExp) : 0
    SM_store[i+1] = min(max_stor, thisSMstor - min(thisSMstor, deepDrainage) + infiltration[i])
end
return(SM_store / n)
end

using Distributions, BenchmarkTools
exInfiltration = rand(Normal(1, 1), 365*20);
exPET = rand(Normal(1, 2), 365*20);
@benchmark SM_routine_f($exInfiltration, $exPET)
Run Code Online (Sandbox Code Playgroud)

Kel*_*ndy 6

使用这样的 NumPy 数据类型是很糟糕的。如果我将输入数组转换为 Python 浮点数的 Python 列表,在不使用 NumPy 的情况下完成工作,最后转换回 NumPy 数组,速度大约快 20 倍。三种尝试:

\n
original  94.9 ms\nimproved   4.4 ms\noriginal  95.1 ms\nimproved   4.4 ms\noriginal  93.2 ms\nimproved   4.5 ms\n
Run Code Online (Sandbox Code Playgroud)\n

通过避免 Python 的慢速min函数的进一步改进,使其速度提高了约 30 倍:

\n
original  91.4 ms\nimproved   3.3 ms\noriginal  99.4 ms\nimproved   3.3 ms\noriginal  96.7 ms\nimproved   3.2 ms\n
Run Code Online (Sandbox Code Playgroud)\n

一些微观优化可以进一步加快速度:

\n
  3.11 \xc2\xb1 0.03 ms  improved\n  2.58 \xc2\xb1 0.02 ms  optimized\n
Run Code Online (Sandbox Code Playgroud)\n

原始、改进、比较和基准代码:

\n
original  94.9 ms\nimproved   4.4 ms\noriginal  95.1 ms\nimproved   4.4 ms\noriginal  93.2 ms\nimproved   4.5 ms\n
Run Code Online (Sandbox Code Playgroud)\n

在线尝试这个!

\n


GKi*_*GKi 6

使用 RCPP 的“R”版本。

\n
Rcpp::sourceCpp(code=r"(\n#include <Rcpp.h>\n#include <algorithm>\nusing namespace Rcpp;\n// [[Rcpp::export]]\nNumericVector rcpp(NumericVector infiltration,\n                                      NumericVector PET) {\n  double rcScl = 0.1;\n  double rcExp = 1.3;\n  double PETexp = 2;\n  double Zr = 1000.0;\n  double n = 0.5;\n  double smhp = 0.00;\n  double smwp = 0.10;\n  double smfc = 0.25;\n  double s0 = 0.5;\n  size_t nt = infiltration.size();\n  NumericVector SM_store(no_init(nt+1));\n  SM_store[0] = s0;\n  double smhp_stor = Zr * smhp;\n  double smwp_stor = Zr * smwp;\n  double smfc_stor = Zr * smfc;\n  double max_stor = Zr * n;\n  for(size_t i=0; i<nt; ++i) {\n    double thisSMstor = SM_store[i];\n    if(thisSMstor > smhp_stor) thisSMstor -= std::min(thisSMstor - smhp_stor,\n                    PET[i] * pow(thisSMstor / max_stor, PETexp));\n    double deepDrainage = (thisSMstor > smfc_stor) ?  std::min(thisSMstor - smfc_stor, rcScl * thisSMstor * pow(thisSMstor / smfc_stor, rcExp)) : 0;\n    SM_store[i+1] = std::min(max_stor, thisSMstor - std::min(thisSMstor, deepDrainage) + infiltration[i]);\n  }\n  return SM_store;\n}\n)")\n\nRNGkind('Mersenne-Twister', 'Inversion', 'Rejection')\nset.seed(42)\nexInfil = rnorm(365*20, mean=1, sd=1)\nexPET = rnorm(365*20, mean=2, sd=1)\nbench::mark(rcpp(exInfil, exPET))\n#  expression      min median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time\n#  <bch:expr>    <bch> <bch:>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm>\n#1 rcpp(exInfil\xe2\x80\xa6 284\xc2\xb5s  305\xc2\xb5s     3318.    59.6KB     2.01  1649     1      497ms\n
Run Code Online (Sandbox Code Playgroud)\n

在我的电脑上需要0.305ms,但实际上这是 C++ 中的时间。

\n

稍微改变的 Julia 版本:

\n
function SM_routine_fB(infiltration::Vector{Float64}, PET::Vector{Float64})\n  rcScl = 0.1\n  rcExp = 1.3\n  PETexp = 2\n  Zr = 1000.0\n  n = 0.5\n  smhp = 0.00\n  smwp = 0.10\n  smfc = 0.25\n  s0 = 0.5\n  nt = length(infiltration)\n  SM_store = Vector{Float64}(undef, nt+1)\n  SM_store[1] = s0\n  smhp_stor = Zr * smhp\n  smwp_stor = Zr * smwp\n  smfc_stor = Zr * smfc\n  max_stor = Zr * n\n  @fastmath for i in 1:nt\n    thisSMstor = SM_store[i]\n    if thisSMstor > smhp_stor\n      thisSMstor -= min(thisSMstor - smhp_stor, PET[i] * (thisSMstor / max_stor) ^ PETexp)\n    end\n    deepDrainage = (thisSMstor > smfc_stor) ?  min(thisSMstor - smfc_stor, rcScl * thisSMstor * (thisSMstor / smfc_stor)^rcExp) : 0\n    SM_store[i+1] = min(max_stor, thisSMstor - min(thisSMstor, deepDrainage) + infiltration[i])\n  end\n  return(SM_store / n)\nend\n\nusing Distributions, BenchmarkTools\nexInfiltration = rand(Normal(1, 1), 365*20);\nexPET = rand(Normal(1, 2), 365*20);\n@benchmark SM_routine_fB($exInfiltration, $exPET)\n#BenchmarkTools.Trial: 10000 samples with 1 evaluation.\n# Range (min \xe2\x80\xa6 max):  348.894 \xce\xbcs \xe2\x80\xa6  1.257 ms  \xe2\x94\x8a GC (min \xe2\x80\xa6 max): 0.00% \xe2\x80\xa6 60.58%\n# Time  (median):     369.202 \xce\xbcs              \xe2\x94\x8a GC (median):    0.00%\n# Time  (mean \xc2\xb1 \xcf\x83):   379.450 \xce\xbcs \xc2\xb1 41.742 \xce\xbcs  \xe2\x94\x8a GC (mean \xc2\xb1 \xcf\x83):  0.39% \xc2\xb1  2.93%\n#\n#  \xe2\x96\x88   \xe2\x96\x86\xe2\x96\x87\xe2\x96\x82\xe2\x96\x81\xe2\x96\x82\xe2\x96\x88\xe2\x96\x87\xe2\x96\x83\xe2\x96\x83\xe2\x96\x83\xe2\x96\x83\xe2\x96\x86\xe2\x96\x83\xe2\x96\x83\xe2\x96\x84\xe2\x96\x83\xe2\x96\x84\xe2\x96\x83\xe2\x96\x83\xe2\x96\x82\xe2\x96\x82\xe2\x96\x82\xe2\x96\x82\xe2\x96\x81\xe2\x96\x81\xe2\x96\x81\xe2\x96\x82\xe2\x96\x82\xe2\x96\x81\xe2\x96\x81\xe2\x96\x81\xe2\x96\x81\xe2\x96\x81\xe2\x96\x82\xe2\x96\x81\xe2\x96\x81\xe2\x96\x82\xe2\x96\x82\xe2\x96\x81\xe2\x96\x81\xe2\x96\x81\xe2\x96\x81               \xe2\x96\x82\n#  \xe2\x96\x88\xe2\x96\x87\xe2\x96\x87\xe2\x96\x85\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x87\xe2\x96\x88\xe2\x96\x87\xe2\x96\x86\xe2\x96\x86\xe2\x96\x86\xe2\x96\x86\xe2\x96\x86\xe2\x96\x86 \xe2\x96\x88\n#  349 \xce\xbcs        Histogram: log(frequency) by time       470 \xce\xbcs <\n#\n# Memory estimate: 114.22 KiB, allocs estimate: 4.\n
Run Code Online (Sandbox Code Playgroud)\n

0.369ms是多少Julia 中的

\n

  • 请注意,如果您将“@fastmath”添加到循环中(即“@fastmath for i in 1:nt”),Julia 就会与 C++ 匹配。 (3认同)
  • https://docs.julialang.org/en/v1/manual/performance-tips/#man-performance-annotations 给出: *使用 @fastmath 允许浮点优化对于实数是正确的,但会导致 IEEE 的差异数字。* (3认同)

Osc*_*ith 4

一般来说,将 Julia(或其他快速语言)与 Python/R 进行比较时,10 倍到 100 倍差不多。通常,R 更接近该值的 100 倍,而 Python 通常在约 40 倍范围内,尽管 Python 3.12 最近的改进为 Python 带来了适度的(25% 左右)加速。