MOO*_*OON 1 performance loops r julia
以下Julia和R中的代码表明,总体方差的估计是一个有偏差的估计,即它取决于样本大小,无论我们对不同观察的平均次数,对于少量数据点,它是不等于人口的差异.
完成两个循环需要Julia~10秒,R在~7秒内完成.如果我将代码保留在循环注释中,那么R和Julia中的循环需要相同的时间,如果我只将s = s + i+ jJulia 的迭代器加起来在~0.15s中完成,R在~0.5s内完成.
难道Julia循环缓慢或R变快吗?如何提高Julia下面代码的速度?R代码可以变得更快吗?
朱莉娅:
using Plots
trials = 100000
sample_size = 10;
sd = Array{Float64}(trials,sample_size-1)
tic()
for i = 2:sample_size
for j = 1:trials
res = randn(i)
sd[j,i-1] = (1/(i))*(sum(res.^2))-(1/((i)*i))*(sum(res)*sum(res))
end
end
toc()
sd2 = mean(sd,1)
plot(sd2[1:end])
Run Code Online (Sandbox Code Playgroud)
R:
trials = 100000
sample_size = 10
sd = matrix(, nrow = trials, ncol = sample_size-1)
start_time = Sys.time()
for(i in 2:sample_size){
for(j in 1:trials){
res <- rnorm(n = i, mean = 0, sd = 1)
sd[j,i-1] = (1/(i))*(sum(res*res))-(1/((i)*i))*(sum(res)*sum(res))
}
}
end_time = Sys.time()
end_time - start_time
sd2 = apply(sd,2,mean)
plot(sqrt(sd2))
Run Code Online (Sandbox Code Playgroud)
我可以实现更高速度的一种方法是使用并行循环,这在Julia中很容易实现:
using Plots
trials = 100000
sample_size = 10;
sd = SharedArray{Float64}(trials,sample_size-1)
tic()
@parallel for i = 2:sample_size
for j = 1:trials
res = randn(i)
sd[j,i-1] = (1/(i))*(sum(res.^2))-(1/((i)*i))*(sum(res)*sum(res))
end
end
toc()
sd2 = mean(sd,1)
plot(sd2[1:end])
Run Code Online (Sandbox Code Playgroud)
在Julia中使用全局变量通常很慢,并且应该提供与R相当的速度.您应该将代码包装在函数中以使其快速.
这是我的笔记本电脑的时间安排(我只删除了相关部分):
julia> function test()
trials = 100000
sample_size = 10;
sd = Array{Float64}(trials,sample_size-1)
tic()
for i = 2:sample_size
for j = 1:trials
res = randn(i)
sd[j,i-1] = (1/(i))*(sum(res.^2))-(1/((i)*i))*(sum(res)*sum(res))
end
end
toc()
end
test (generic function with 1 method)
julia> test()
elapsed time: 0.243233887 seconds
0.243233887
Run Code Online (Sandbox Code Playgroud)
另外在Julia中,如果你使用randn!而不是randn你可以加快速度,因为你避免重新分配res向量(我没有对代码进行其他优化,因为与R相比,这种优化与Julia不同;此代码中的所有其他可能的加速会以类似的方式帮助Julia和R):
julia> function test2()
trials = 100000
sample_size = 10;
sd = Array{Float64}(trials,sample_size-1)
tic()
for i = 2:sample_size
res = zeros(i)
for j = 1:trials
randn!(res)
sd[j,i-1] = (1/(i))*(sum(res.^2))-(1/((i)*i))*(sum(res)*sum(res))
end
end
toc()
end
test2 (generic function with 1 method)
julia> test2()
elapsed time: 0.154881137 seconds
0.154881137
Run Code Online (Sandbox Code Playgroud)
最后,最好使用BenchmarkToolspackage来测量Julia中的执行时间.首先tic,toc功能将从Julia 0.7中删除.第二 - 如果你使用它们test,你会混合编译和执行时间(当运行两次函数时,你会看到第二次运行的时间减少,因为Julia没有花时间编译函数).
编辑:
你可以保留trials,sample_size并sd作为全局变量,但是,那么你应该使用前缀与他们const.然后在这样的函数中包装循环就足够了:
const trials = 100000;
const sample_size = 10;
const sd = Array{Float64}(trials,sample_size-1);
function f()
for i = 2:sample_size
for j = 1:trials
res = randn(i)
sd[j,i-1] = (1/(i))*(sum(res.^2))-(1/((i)*i))*(sum(res)*sum(res))
end
end
end
tic()
f()
toc()
Run Code Online (Sandbox Code Playgroud)
现在@parallel:
首先,您应该使用@sync之前@parallel确保所有工作正常(即在转到下一条指令之前所有工作人员都已完成).要查看为什么需要这样做,请在具有多个worker的系统上运行以下代码:
sd = SharedArray{Float64}(10^6);
@parallel for i = 1:2
if i < 2
sd[i] = 1
else
for j in 2:10^6
sd[j] = 1
end
end
end
minimum(sd) # most probably prints 0.0
sleep(1)
minimum(sd) # most probably prints 1.0
Run Code Online (Sandbox Code Playgroud)
而这个
sd = SharedArray{Float64}(10^6);
@sync @parallel for i = 1:2
if i < 2
sd[i] = 1
else
for j in 2:10^6
sd[j] = 1
end
end
end
minimum(sd) # always prints 1.0
Run Code Online (Sandbox Code Playgroud)
其次,速度提升是由于@parallel宏观没有SharedArray.如果您在Julia上尝试使用一个工作程序的代码,它也会更快.简而言之,原因是@parallel将代码内部包装在函数中.您可以使用@macroexpand以下方法进行检查:
julia> @macroexpand @sync @parallel for i = 2:sample_size
for j = 1:trials
res = randn(i)
sd[j,i-1] = (1/(i))*(sum(res.^2))-(1/((i)*i))*(sum(res)*sum(res))
end
end
quote # task.jl, line 301:
(Base.sync_begin)() # task.jl, line 302:
#19#v = (Base.Distributed.pfor)(begin # distributed\macros.jl, line 172:
function (#20#R, #21#lo::Base.Distributed.Int, #22#hi::Base.Distributed.Int) # distributed\macros.jl, line 173:
for i = #20#R[#21#lo:#22#hi] # distributed\macros.jl, line 174:
begin # REPL[22], line 2:
for j = 1:trials # REPL[22], line 3:
res = randn(i) # REPL[22], line 4:
sd[j, i - 1] = (1 / i) * sum(res .^ 2) - (1 / (i * i)) * (sum(res) * sum(res))
end
end
end
end
end, 2:sample_size) # task.jl, line 303:
(Base.sync_end)() # task.jl, line 304:
#19#v
end
Run Code Online (Sandbox Code Playgroud)