Julia 与 Mathematica：数值积分性能

Question

Julia 与 Mathematica：数值积分性能

kal*_*alt 3 wolfram-mathematica numerical-integration julia

朱莉娅绝对是新人，有一个问题要问你。

我正在通过移植我的一些 Mathematica 和 Python 代码（主要是物理学中的科学计算等）来自学一些 Julia，然后看看是什么。到现在为止，事情还算顺利。而且很快。到目前为止。

现在，我正在模拟一个基本的锁定放大器，它本质上接受一个可能非常复杂的时间相关信号Uin(t)，并产生一个输出，Uout(t)在某个参考频率上锁相fref（也就是说，它突出显示的分量Uin(t)，与参考正弦波有一定的相位关系）。描述无关紧要，重要的是它基本上是通过计算积分来实现的（为了清楚起见，我实际上在这里省略了阶段）：

在此处输入图片说明

因此，我在 Mathematica 和 Julia 中开始并测试了这一点：我定义了一个模型Uin(t)，传递了一些参数值，然后Uout(t)在时间t = 0为fref.

Julia：我使用QuadGK包进行数值积分。

T = 0.1
f = 100.
Uin(t) = sin(2pi * f * t) + sin(2pi * 2f *t)
Uout(t, fref)::Float64 = quadgk(s -> Uin(s) * sin(2pi * fref * s), t-T, t, rtol=1E-3)[1]/T
frng = 80.:1.:120.
print(@time broadcast(Uout, 0., frng))

Run Code Online (Sandbox Code Playgroud)

数学

T = 0.1;
f = 100.;
Uin[t_] := Sin[2 ? f t] + Sin[2 ? 2 f t]
Uout[t_, fref_] := NIntegrate[Sin[2 ? fref s] Uin[s], {s, t - T, t}]/T
frng = Table[i, {i, 80, 120, 1}];
Timing[Table[Uout[0, fr], {fr, frng}]]

Run Code Online (Sandbox Code Playgroud)

结果：

Julia 将操作计时在 45 到 55 秒之间，在 i7-5xxx 笔记本电脑上使用电池供电，这是很多，而 Mathematica 在大约 2 秒内完成。这种差异是可怕的，老实说，令人难以置信。我知道 Mathematica 的内核中有一些非常漂亮和精致的算法，但 Julia 就是 Julia。所以，问题是：什么给了？

PS：设置f和T作为const确实降低了朱莉娅的时间〜8-10秒，但f并T不能成为constS IN的实际程序。除此之外，还有什么明显的我遗漏了吗？

2020 年 2 月 2 日编辑：

减慢似乎是由于当值接近零时算法试图降低精度，例如见下文：对于 fref = 95，计算需要 1 整秒（！），而对于相邻频率值，它会立即计算（返回的结果是 (res, error)) 的元组。似乎 quadgk 函数在非常小的值处停止）：

  0.000124 seconds (320 allocations: 7.047 KiB)
fref = 94.0 (-0.08637214864144352, 9.21712218998258e-6)

  1.016830 seconds (6.67 M allocations: 139.071 MiB, 14.35% gc time)
fref = 95.0 (-6.088184966010742e-16, 1.046186419361636e-16)

  0.000124 seconds (280 allocations: 6.297 KiB)
fref = 96.0 (0.1254003757465191, 0.00010132083518769636)

Run Code Online (Sandbox Code Playgroud)

注意：这与我要求生产的公差无关。此外，Mathematica 默认情况下通常会达到机器精度容差，但在接近零时会有所放缓，而 numpy/scipy 只是在整个过程中飞行，但产生的结果不如 Mathematica 精确（在默认设置下；并没有对此进行过多研究）。

Answer 1

ARa*_*rez 6

我在您的代码中看到的第一个问题是它的类型不稳定。这是因为您使用了全局变量（请参阅Julia Performance Tips 中的Performance Tip 一）：编译器无法知道您在函数中使用的fand的类型T，因此它无法进行有效的编译。这也是为什么当您将它们标记为 const 时，性能会提高：现在编译器可以保证它们不会改变它们的类型，因此它可以有效地编译您的两个函数。

如何查看您的代码是否不稳定

如果你用这样的宏运行你的第一个函数@code_warntype：

@code_warntype Uin(0.1,f)

Run Code Online (Sandbox Code Playgroud)

你会看到这样的输出：

julia> @code_warntype Uin(0.1)
Variables
  #self#::Core.Compiler.Const(Uin, false)
  t::Float64

Body::Any
1 ? %1 = (2.0 * Main.pi)::Core.Compiler.Const(6.283185307179586, false)
?   %2 = (%1 * Main.f * t)::Any
?   %3 = Main.sin(%2)::Any
?   %4 = (2.0 * Main.pi)::Core.Compiler.Const(6.283185307179586, false)
?   %5 = (2.0 * Main.f)::Any
?   %6 = (%4 * %5 * t)::Any
?   %7 = Main.sin(%6)::Any
?   %8 = (%3 + %7)::Any
???      return %8

Run Code Online (Sandbox Code Playgroud)

所有这些都Anys告诉您编译在任何步骤都不知道输出的类型。

怎么修

您可以重新定义函数以接收f和T作为变量：

Uin(t,f) = sin(2.0pi * f * t) + sin(2.0pi * 2.0f *t)
Uout(t, fref,f,T)::Float64 = quadgk(s -> Uin(s,f) * sin(2pi * fref * s), t-T, t, rtol=1E-3)[1]/T

Run Code Online (Sandbox Code Playgroud)

通过这些重新定义，您的代码运行得更快。如果您尝试检查它们，@code_warntype您将看到现在编译器正确地推断出所有内容的类型。

如需进一步的性能改进，您可以查看Julia Performance Tips

特别是，通常建议使用而不是使用来衡量性能的方法@time是@btime从包BenchmarkTools. 之所以如此，是因为在运行时@time您也在测量编译时间（另一个选项是运行 @time 两次 - 第二个测量将是正确的，因为所有函数都有机会编译）。

Answer 2

DNF*_*DNF 6

您的问题与容错的选择有关。1e-3 的相对误差听起来并没有那么糟糕，但实际上是当积分接近于零时。特别是，这种情况发生在fref = 80.0（和 85、90、95，而不是100、105 等）时：

julia> Uout(0.0, 80.0, f, T)
1.2104987553880609e-16

Run Code Online (Sandbox Code Playgroud)

引用以下文档字符串quadgk：

（请注意，在 norm(I) 可能为零的情况下，指定正 atol 很有用。）

让我们尝试设置一个绝对容差，例如 1e-6，然后进行比较。首先是代码（使用来自@ARamirez 的代码）：

Uin(t, f) = sin(2? * f * t) + sin(4? * f * t)

function Uout(t, fref, f , T)
    quadgk(s -> Uin(s, f) * sin(2? * fref * s), t-T, t, rtol=1e-3)[1]/T
end
function Uout_new(t, fref, f , T) # with atol
    quadgk(s -> Uin(s, f) * sin(2? * fref * s), t-T, t, rtol=1e-3, atol=1e-6)[1]/T
end

Run Code Online (Sandbox Code Playgroud)

然后是基准测试（为此使用 BenchmarkTools）

using BenchmarkTools
T = 0.1
f = 100.0
freqs = 80.0:1.0:120.0

@btime Uout.(0.0, $freqs, $f, $T);
6.302 s (53344283 allocations: 1.09 GiB)

@btime Uout_new.(0.0, $freqs, $f, $T);
1.270 ms (11725 allocations: 262.08 KiB)

Run Code Online (Sandbox Code Playgroud)

好的，那快了 5000 倍。这可以吗？

Answer 3

Fre*_*gge 5

您可以采取多种措施来进一步加快速度。改变集成的顺序确实有一点帮助，使用 Float32 而不是 Float64 做了一个小改进，使用@fastmath做了进一步的小改进。还可以利用SLEEFPirates.sin_fast

using QuadGK, ChangePrecision

@changeprecision Float32 begin
    T = 0.1
    f = 100.
    @inline @fastmath Uin(t,f) = sin(2pi * f * t) + sin(2pi * 2f *t)
    @fastmath Uout(t, fref,f,T) = first(quadgk(s -> Uin(s,f) * sin(2pi * fref * s), t-T, t, rtol=1e-2, order=10))/T

    frng = 80.:1.:120.
    @time Uout.(0., frng, f, T)
end

Run Code Online (Sandbox Code Playgroud)

归档时间：	5 年，11 月前
查看次数：	1734 次
最近记录：	5 年，11 月前