为什么这两个Julia代码的表现如此不同？

Question

为什么这两个Julia代码的表现如此不同？

function c1()
        x::UInt64 = 0
        while x<= (10^8 * 10)
                x+=1
        end
end

function c2()
        x::UInt64 = 0
        while x<= (10^9)
                x+=1
        end
end

function c3()
        x::UInt64 = 0
        y::UInt64 = 10^8 * 10
        while x<= y
                x+=1
        end
end

function c4()
        x::UInt64 = 0
        y::UInt64 = 10^9
        while x<= y
                x+=1
        end
end

Run Code Online (Sandbox Code Playgroud)

应该是一样的吧？

@time c1()

0.019102 seconds (40.99 k allocations: 2.313 MiB)

@time c1()

0.000003 seconds (4 allocations: 160 bytes)

@time c2()

9.205925 seconds (47.89 k allocations: 2.750 MiB)

@time c2()

9.015212 seconds (4 allocations: 160 bytes)

@time c3()

0.019848 seconds (39.23 k allocations: 2.205 MiB)

@time c3()

0.000003 seconds (4 allocations: 160 bytes)

@time c4()

0.705712 seconds (47.41 k allocations: 2.719 MiB)

@time c4()

0.760354 seconds (4 allocations: 160 bytes)

Run Code Online (Sandbox Code Playgroud)

Answer 1

hck*_*ckr 5

这是关于Julia使用逐个平方的文字的编译时优化.如果能够通过单独的平方或功率为0,1,2,3达到指数,则Julia能够进行优化.这是我相信通过降低x^p到x^Val{p}整数p和使用编译器专用(或内联加一种元编程,我不确定这里的正确术语,但它就像你在Lisp中找到的东西;类似的技术用于源在Julia中的源自动区分,参见Zygote.jl)技术将代码降低到常数,如果p是0,1,2,3或2的幂.

朱莉娅降低10^8到内联 literal_pow(后power_by_squaring),这被降低到一个常量,那么朱莉娅降低constant * 10再弄不变,然后实现所有的,而循环是不必要的,并且消除了环路等,都在编译时.

如果您更改10^8与10^7中c1,你会看到,它会评估数量并在运行时的循环.但是,如果替换10^8为10^4或者10^2您将看到它将在编译时处理所有计算.我认为julia没有专门设置为编译时优化,如果指数是2的幂,而是编译器证明能够优化(将代码降低到常量)代码只是针对那种情况.

其中p1,2,3 的情况在Julia中是硬编码的.通过将代码降低到内联版本literal_pow然后编译专门化,再次优化了这一点.

您可以使用@code_llvm和@code_native宏来查看正在发生的事情.我们试试吧.

julia> f() = 10^8*10
julia> g() = 10^7*10

julia> @code_native f()
.text
; Function f {
; Location: In[101]:2
    movl    $1000000000, %eax       # imm = 0x3B9ACA00
    retq
    nopw    %cs:(%rax,%rax)
;}

julia> @code_native g()
.text
; Function g {
; Location: In[104]:1
; Function literal_pow; {
; Location: none
; Function macro expansion; {
; Location: none
; Function ^; {
; Location: In[104]:1
    pushq   %rax
    movabsq $power_by_squaring, %rax
    movl    $10, %edi
    movl    $7, %esi
    callq   *%rax
;}}}
; Function *; {
; Location: int.jl:54
    addq    %rax, %rax
    leaq    (%rax,%rax,4), %rax
;}
    popq    %rcx
    retq
;}

Run Code Online (Sandbox Code Playgroud)

见f()原来是只是一个常数,而g()会在运行时评估的东西.

我认为如果你想挖掘更多的东西,julia会围绕这个提交启动这个整数取幂技巧.

编辑:让我们编译时优化c2

我还准备了一个计算整数整数指数的函数,julia也将使用它来优化非幂2指数.不过,我不确定它在所有情况下都是正确的.

@inline function ipow(base::Int, exp::Int)
    result = 1;
    flag = true;
    while flag
        if (exp & 1  > 0)
            result *= base;
        end
        exp >>= 1;
        base *= base;
        flag = exp != 0
    end

    return result;
end

Run Code Online (Sandbox Code Playgroud)

现在更换您10^9在c2使用ipow(10,9)和享受编译时优化电源.

还可以看到这个问题的逐个平方.

请不要按原样使用此函数,因为它试图内联所有取幂,无论它是否由文字组成.你不会想要的.

归档时间：	7 年，7 月前
查看次数：	211 次
最近记录：	7 年，7 月前