我是朱莉娅的新手,我写了一个计算RMSE(均方根误差)的简单函数.ratings是一个评级矩阵,每一行都是[user, film, rating].有1500万个评级.该rmse()方法需要12.0秒,但Java实现速度快约188倍:0.064秒.为什么Julia实施会变慢?在Java中,我正在使用一个Rating对象数组,如果它是一个多维int数组,它会更快.
ratings = readdlm("ratings.dat", Int32)
function predict(user, film)
return 3.462
end
function rmse()
total = 0.0
for i in 1:size(ratings, 1)
r = ratings[i,:]
diff = predict(r[1], r[2]) - r[3]
total += diff * diff
end
return sqrt(total / size(ratings)[1])
end
Run Code Online (Sandbox Code Playgroud)
编辑:避免全局变量后,它在1.99秒内完成(比Java慢31倍).删除后r = ratings[i,:],它是0.856秒(慢13倍).
Har*_*lan 10
一些建议:
ratings作为参数.r = ratings[i,:]行制作副本,速度很慢.相反,使用predict(r[i,1], r[i,2]) - r[i,3].square()可能比x*x- 尝试更快.NumericExtensions.jl软件包,该软件包具有针对许多常见数值操作的疯狂优化功能.(参见julia-dev列表)对我来说,以下代码在0.024秒内运行(我怀疑我的笔记本电脑比你的机器快得多).我用注释掉的行初始化了评级,因为我没有你提到的文件.
function predict(user, film)
return 3.462
end
function rmse(r)
total = 0.0
for i = 1:size(r,1)
diff = predict(r[i,1],r[i,2]) - r[i,3]
total += diff * diff
end
return sqrt(total / size(r,1))
end
# ratings = rand(1:20, 5000000, 3)
Run Code Online (Sandbox Code Playgroud)
在我的系统上,问题似乎是你的常量值predict函数没有得到优化.更换多余的调用predict使代码在0.01秒内运行.
function time()
ratings = ones(15_000_000, 3)
predict(user, film) = 3.462
function rmse(ratings)
total = 0.0
for i in 1:size(ratings, 1)
diff = predict(ratings[i, 1], ratings[i, 2]) - ratings[3]
total += diff * diff
end
return sqrt(total / size(ratings, 1))
end
rmse(ratings)
@elapsed rmse(ratings)
end
time()
function time2()
ratings = ones(15_000_000, 3)
predict(user, film) = 3.462
function rmse(ratings)
total = 0.0
for i in 1:size(ratings, 1)
diff = 3.462 - ratings[3]
total += diff * diff
end
return sqrt(total / size(ratings, 1))
end
rmse(ratings)
@elapsed rmse(ratings)
end
time2()
Run Code Online (Sandbox Code Playgroud)