Julia 与 C++ 性能几​​乎相差 30 倍

Rol*_*ter 5 c++ performance julia

下面的 Julia 程序在我的笔记本电脑上大约需要 6 秒(第二次测试(n))。等效的 C++ 程序(使用 Eigen)仅需 0.19 秒。根据我在https://programming-language-benchmarks.vercel.app/cpp上看到的结果,我预计差异要小得多。我的 Julia 程序出了什么问题?我将非常感谢有关如何改进我的 Julia 程序的提示。

using StaticArrays
using Printf

struct CoordinateTransformation
    b1::SVector{3,Float64}
    b2::SVector{3,Float64}
    b3::SVector{3,Float64}
    r0::SVector{3,Float64}
    mf::SMatrix{3,3,Float64}
    mb::SMatrix{3,3,Float64}
end

function dot(a::SVector{3,Float64}, b::SVector{3,Float64}) 
    a[1]*b[1] + a[2]*b[2] + a[3]*b[3]
end

function CoordinateTransformation(b1::SVector{3,Float64}, b2::SVector{3,Float64}, b3::SVector{3,Float64}, r0::SVector{3,Float64})
    mf = MMatrix{3,3,Float64}(undef)

    e1::SVector{3, Float64} = [1.0, 0.0, 0.0]
    e2::SVector{3, Float64} = [0.0, 1.0, 0.0]
    e3::SVector{3, Float64} = [0.0, 0.0, 1.0]

    mf[1, 1] = dot(b1, e1);
    mf[1, 2] = dot(b1, e2);
    mf[1, 3] = dot(b1, e3);
    mf[2, 1] = dot(b2, e1);
    mf[2, 2] = dot(b2, e2);
    mf[2, 3] = dot(b2, e3);
    mf[3, 1] = dot(b3, e1);
    mf[3, 2] = dot(b3, e2);
    mf[3, 3] = dot(b3, e3);
    mb = inv(mf)
    CoordinateTransformation(b1, b2, b3, r0, mf, mb)
end

@inline function transform_point_f(at::CoordinateTransformation, v::MVector{3,Float64})
    at.mf * v + at.r0
end

@inline function transform_point_b(at::CoordinateTransformation, v::MVector{3,Float64})
    at.mb * (v - at.r0)
end

@inline function transform_vector_f(at::CoordinateTransformation, v::MVector{3,Float64})
    at.mf * v
end

@inline function transform_vector_b(at::CoordinateTransformation, v::MVector{3,Float64})
    at.mb * v
end

function test(n)
    theta = 1.0;
    c = cos(1.0);
    s = sin(1.0);
    b1::SVector{3, Float64} = [c, 0.0, s]
    b2::SVector{3, Float64} = [0.0, 1.0, 0.0]
    b3::SVector{3, Float64} = [-s, 0.0, c]
    r0::SVector{3, Float64} = [0.0, 0.0, 1.0]
    at::CoordinateTransformation = CoordinateTransformation(b1, b2, b3, r0)

    @printf("%e\n", n)

    points = Array{MVector{3, Float64}, 1}(undef, n)
    @inbounds for i in 1:n
        points[i] = [1.0, 0.0, 0.0]
    end

    @inbounds for i in 1:n
        points[i] = transform_point_f(at, points[i])
    end
    println(points[n])

    @inbounds for i in 1:n
        points[i] = transform_point_b(at, points[i])
    end
    println(points[n])
end


n = 10000000
@timev test(n)
@timev test(n)
Run Code Online (Sandbox Code Playgroud)

Vin*_* Yu 9

您的函数的一个主要问题是在 3 个循环中分配了test大量的s。MVector另外,由于MVectors 是可变结构体,属于引用类型,因此points向量是引用向量,这对性能来说并不是很好。

相反,我建议更改points为 s 向量SVector并修改代码以适应这一点(例如,将 every 替换为MVectorSVector。在第一个循环中,points[i] = [1.0, 0.0, 0.0]应更改为points[i] = SA[1.0, 0.0, 0.0]以避免创建临时向量的分配。(另请参阅埃里克对此的评论。)

实施这些简单的改变,我看到了改进

2.523284 seconds (40.00 M allocations: 1.714 GiB, 43.11% gc time)
Run Code Online (Sandbox Code Playgroud)

0.171544 seconds (267 allocations: 228.891 MiB)
Run Code Online (Sandbox Code Playgroud)