Julia 匿名函数和性能

Question

Julia 匿名函数和性能

Aub*_*ine 6 python performance anonymous-function julia

我正在移植这个 Python 代码...

with open(filename, 'r') as f:
    results = [np.array(line.strip().split(' ')[:-1], float)
               for line in filter(lambda l: l[0] != '#', f.readlines())]

Run Code Online (Sandbox Code Playgroud)

……给朱莉娅。我想出了：

results = [map(ss -> parse(Float64, ss), split(s, " ")[1:end-1])
        for s in filter(s -> s[1] !== '#', readlines(filename))];

Run Code Online (Sandbox Code Playgroud)

这种移植的主要原因是潜在的性能提升，所以我在 Jupyter notebook 中对两个片段进行了计时：

使用%%timeit...
- Python： 12.8 ms ± 44.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
- 朱莉娅：@benchmark返回（除其他外）mean time: 8.250 ms (2.62% GC)。到现在为止还挺好; 我确实得到了性能提升。
但是，在使用时@time：
- 我得到了一些东西0.103095 seconds (130.44 k allocations: 11.771 MiB, 91.58% compilation time)。从这个线程我推断它可能是由我的 -> 函数声明引起的。

事实上，如果我用以下代码替换我的代码：

filt = s -> s[1] !== '#';
pars = ss -> parse(Float64, ss);
res = [map(pars, split(s, " ")[1:end-1])
        for s in filter(filt, readlines(filename))];

Run Code Online (Sandbox Code Playgroud)

而时间只有最后一行，我得到了一个更令人鼓舞的0.073007 seconds (60.58 k allocations: 7.988 MiB, 88.33% compilation time)；欢呼！然而，它有点违背了匿名函数的目的（至少在我的理解中是这样），并可能导致一堆 f1、f2、f3，......在列表理解之外为我的 Python lambda 函数命名不会似乎会影响 Python 的运行时。

我的问题是：为了获得正常的性能，我应该系统地命名我的 Julia 函数吗？请注意，此特定代码段将在约 30k 个文件的循环中调用。（基本上，我正在做的是读取由空格分隔的浮点数和注释行混合而成的文件；每个浮点数行可以有不同的长度，我对行的最后一个元素不感兴趣。对我的解决方案的任何评论都是赞赏。）

（边评论：包装s与strip完全打乱了@benchmark，加上10毫秒的意思，但似乎并没有影响到@time任何理由？）

按照 DNF 的建议将所有内容放在一个函数中可以解决我的“必须命名我的匿名函数”问题。使用Vincent Yu 的公式之一：

function results(filename::String)::Vector{Vector{Float64}}
    [[parse(Float64, s) for s in @view split(line, ' ')[1:end-1]]
        for line in Iterators.filter(!startswith('#'), eachline(filename))]
end

@benchmark results(FN)
BenchmarkTools.Trial: 
  memory estimate:  3.74 MiB
  allocs estimate:  1465
  --------------
  minimum time:     7.108 ms (0.00% GC)
  median time:      7.458 ms (0.00% GC)
  mean time:        7.580 ms (1.58% GC)
  maximum time:     9.538 ms (14.84% GC)
  --------------
  samples:          659
  evals/sample:     1

Run Code Online (Sandbox Code Playgroud)

在此函数上调用的 @time 在第一次编译运行后返回等效结果。我很高兴。

但是，这是我对 strip 的持续问题：

function results_strip(filename::String)::Vector{Vector{Float64}}
    [[parse(Float64, s) for s in @view split(strip(line), ' ')[1:end-1]]
        for line in Iterators.filter(!startswith('#'), eachline(filename))]
end

@benchmark results_strip(FN)
BenchmarkTools.Trial: 
  memory estimate:  3.74 MiB
  allocs estimate:  1465
  --------------
  minimum time:     15.155 ms (0.00% GC)
  median time:      15.742 ms (0.00% GC)
  mean time:        15.885 ms (0.75% GC)
  maximum time:     19.089 ms (10.02% GC)
  --------------
  samples:          315
  evals/sample:     1

Run Code Online (Sandbox Code Playgroud)

中位时间翻倍。如果我只看条带：

function only_strip(filename::String)
    [strip(line) for line in Iterators.filter(!startswith('#'), eachline(filename))]
end

@benchmark only_strip(FN)
BenchmarkTools.Trial: 
  memory estimate:  1.11 MiB
  allocs estimate:  475
  --------------
  minimum time:     223.868 ?s (0.00% GC)
  median time:      258.227 ?s (0.00% GC)
  mean time:        325.389 ?s (9.41% GC)
  maximum time:     56.024 ms (75.09% GC)
  --------------
  samples:          10000
  evals/sample:     1

Run Code Online (Sandbox Code Playgroud)

数字只是不加起来。是否存在类型不匹配，我应该将结果转换为其他内容吗？

Answer 1

Bog*_*ski 5

为了（希望）清楚地总结 Colin T Bowers 和 DNF 的评论：

在 Julia 中，匿名函数在编译后与命名函数一样快。
您观察到的差异是由编译时间引起的。
当您使用 BenchmarkTools.jl ( @btime) 时，时间总是在编译后测量。如果你只是使用@time计算时间包括编译。实际上，您会在输出中获得此信息（您在其中获得了编译时间的百分比）。
同样，如果将整个表达式放在函数中，它只会编译一次，而如果在顶级范围内评估它，则每次运行时都会编译它。

结论是：

如果您的代码在计算上确实很昂贵，那么编译时间并不重要（因为它是一次性恒定成本），因此如果您要执行大量计算，则不必担心。
但是，如果您的计算很便宜，则编译时间将很明显，因为每次在顶级范围内引入新的匿名函数时，都必须对其进行编译。

让我给你一个规范的例子来展示这个问题，希望能帮助你更好地理解这个问题：

julia> x = rand(10^6);

julia> @time count(v -> v < 0.5, x) # a lot of compilation as everything needs to be compiled
  0.033077 seconds (18.34 k allocations: 1.047 MiB, 110.16% compilation time)
499921

julia> @time count(v -> v < 0.5, x) # v -> v < 0.5 is a new function - it has to be compiled
  0.013155 seconds (5.85 k allocations: 322.655 KiB, 95.92% compilation time)
499921

julia> @time count(v -> v < 0.5, x) # v -> v < 0.5 is a new function - it has to be compiled
  0.017371 seconds (5.85 k allocations: 322.702 KiB, 95.37% compilation time)
499921

julia> f(x) = x < 0.5
f (generic function with 1 method)

julia> @time count(f, x) # f is a new function - it has to be compiled
  0.011609 seconds (5.82 k allocations: 321.351 KiB, 95.85% compilation time)
499921

julia> @time count(f, x) # f was already compiled - we are fast
  0.000596 seconds (2 allocations: 32 bytes)
499921

julia> @time count(f, x) # f was already compiled - we are fast
  0.000621 seconds (2 allocations: 32 bytes)
499921

julia> @time count(<(0.5), x) # <(0.5) is a new callable - it has to be compiled
  0.013751 seconds (7.71 k allocations: 456.232 KiB, 96.03% compilation time)
499921

julia> @time count(<(0.5), x) # <(0.5) is callable already compiled - we are fast
  0.000504 seconds (2 allocations: 32 bytes)
499921

julia> @time count(<(0.5), x) # <(0.5) is callable already compiled - we are fast
  0.000616 seconds (2 allocations: 32 bytes)

Run Code Online (Sandbox Code Playgroud)

关键是每次编写它v -> v > 0.5都是一个新函数，即使你使用了完全相同的定义——如果你在全局范围内引入它，Julia 必须创建一个新的匿名函数。在这里很容易看到：

julia> v -> v > 0.5
#7 (generic function with 1 method)

julia> v -> v > 0.5
#9 (generic function with 1 method)

Run Code Online (Sandbox Code Playgroud)

（请注意，数字增加 - 这是一个不同的功能）

现在看看>(0.5)：

julia> >(0.5)
(::Base.Fix2{typeof(>), Float64}) (generic function with 1 method)

julia> >(0.5)
(::Base.Fix2{typeof(>), Float64}) (generic function with 1 method)

Run Code Online (Sandbox Code Playgroud)

它每次都是相同的可调用对象 - 所以它只需要编译一次。

最后，如果你把东西包装在一个函数中，正如 DNF 解释的那样，你有：

julia> test() = v -> v > 0.5
test (generic function with 1 method)

julia> test()
#11 (generic function with 1 method)

julia> test()
#11 (generic function with 1 method)

Run Code Online (Sandbox Code Playgroud)

正如您所看到的，匿名函数是在命名函数中定义的，编译器每次都知道它是同一个匿名函数，因此数量不会增加（它只需要编译一次 - 第一次test被调用）。

关于strip问题。差异是可见的，@btime但不是 with，@time因为stripin@time的成本与编译成本相形见绌，因此您根本无法看到差异，但实际上在两种情况下都是相同的。

Answer 2

Vin*_* Yu 5

Bogumi\xc5\x82 Kami\xc5\x84ski\ 的回答非常好。我写这篇文章只是为了评论您的解决方案。

\n

请注意，您可以使用标准库中的DelimitedFiles模块将此类文件读入矩阵。像这样：

\n

using DelimitedFiles\nreaddlm(filename, \' \', Float64; comments=true, comment_char=\'#\')\n

Run Code Online (Sandbox Code Playgroud)\n

但您可能会发现这比您的代码慢，因为它将数据读入列主矩阵而不是基于行的向量向量。哪一种更好取决于您的需求。（当然，有许多包可以将分隔文件读取到各种结构中。）

\n

关于您的解决方案，我建议进行一些可以提高性能和内存使用的小更改：

\n

并readlines分配filter您不保留的新向量。为了避免这些内存分配，请使用和提供的迭代器接口。eachlineIterators.filter
同样，索引[1:end-1]会创建不必要的向量。使用view或方便的@view宏来避免分配。

\n

此外，我认为在这段代码中坚持使用数组map理解或数组理解而不是混合两者更清楚。

\n

以下代码合并了这些更改（使用符号map）。在我的测试用例中，这将速度提高了大约 15%，内存使用量提高了大约 30%：do

\n

results = map(Iterators.filter(!startswith(\'#\'), eachline(filename))) do line\n    map(@view split(line, \' \')[1:end-1]) do s\n        parse(Float64, s)\n    end\nend\n

Run Code Online (Sandbox Code Playgroud)\n

如果您更喜欢数组理解而不是map，则以下内容是相同的：

\n

results = [\n    [\n        parse(Float64, s)\n        for s in @view split(line, \' \')[1:end-1]\n    ]\n    for line in Iterators.filter(!startswith(\'#\'), eachline(filename))\n]\n

Run Code Online (Sandbox Code Playgroud)\n

正如您在评论中指出的那样，我们可以使用广播来消除显式的内部循环，从而产生更清晰的代码：

\n

results = map(Iterators.filter(!startswith(\'#\'), eachline(filename))) do line\n    parse.(Float64, @view split(line, \' \')[1:end-1])\nend\n\nresults = [\n    parse.(Float64, @view split(line, \' \')[1:end-1])\n    for line in Iterators.filter(!startswith(\'#\'), eachline(filename))\n]\n

Run Code Online (Sandbox Code Playgroud)\n

归档时间：	4 年，6 月前
查看次数：	213 次
最近记录：	4 年，6 月前