假设我有一个包含两列的数据集。我已经在我的数据集上建立了线性回归模型,现在我的问题是如何检查模型的准确性。
我发现我的问题的答案是在我的数据集上应用 K 折。我知道 K-fold 是如何工作的,但我不知道如何在我的 Julia 程序中实现 K-fold。
#suppose I have two columns x and y in my dataset
x= [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]
y=[2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21]
# now how do I use K-fold to split dataset and also evaluate my algorithm?
Run Code Online (Sandbox Code Playgroud)
正如评论中提到的,一旦给出任何基础源,设置一些代码就会更容易。例如在这种情况下,K 折交叉验证可能需要进行如下准备:
\n\njulia> x= [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20];\n\njulia> y=[2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21];\n\njulia> K = 5 # number of folds in validation\n5\n\njulia> N = length(x) # number of samples in dataset\n20\n\njulia> stops = round.(Int,linspace(1,N,K+1))\n6-element Array{Int64,1}:\n 1\n 5\n 9\n 12\n 16\n 20\n\njulia> vsets = [s:e-(e<N)*1 for (s,e) in zip(stops[1:end-1],stops[2:end])]\n5-element Array{UnitRange{Int64},1}:\n 1:4 \n 5:8 \n 9:11 \n 12:15\n 16:20\n\njulia> tsets1 = [1:s-1 for (s,e) in zip(stops[1:end-1],stops[2:end])]\n5-element Array{UnitRange{Int64},1}:\n 1:0 \n 1:4 \n 1:8 \n 1:11\n 1:15\n\njulia> tsets2 = [e+(e<=N)*1:N for (s,e) in zip(stops[1:end-1],stops[2:end])]\n5-element Array{UnitRange{Int64},1}:\n 6:20 \n 10:20\n 13:20\n 17:20\n 21:20\n\njulia> \xcf\x83 = randperm(N);\n\njulia> [x[\xcf\x83[vsets[i]]] for i=1:K] # validation sets\n5-element Array{Array{Int64,1},1}:\n [5, 13, 6, 10] \n [16, 4, 2, 3] \n [9, 19, 20] \n [17, 12, 14, 11] \n [8, 1, 18, 7, 15]\n\njulia> [x[vcat(\xcf\x83[tsets1[i]],\xcf\x83[tsets2[i]])] for i=1:K] # training sets\n5-element Array{Array{Int64,1},1}:\n [4, 2, 3, 9, 19, 20, 17, 12, 14, 11, 8, 1, 18, 7, 15] \n [5, 13, 6, 10, 19, 20, 17, 12, 14, 11, 8, 1, 18, 7, 15] \n [5, 13, 6, 10, 16, 4, 2, 3, 12, 14, 11, 8, 1, 18, 7, 15]\n [5, 13, 6, 10, 16, 4, 2, 3, 9, 19, 20, 1, 18, 7, 15] \n [5, 13, 6, 10, 16, 4, 2, 3, 9, 19, 20, 17, 12, 14, 11] \n
Run Code Online (Sandbox Code Playgroud)\n\n这也许是令人满意的。有关 K 折交叉验证的更多详细信息,请参阅维基百科的链接:https: //en.wikipedia.org/wiki/Cross-validation_ (statistics)#k-fold_cross-validation
\n您可以folds
使用MLDataUtils.jl。
kfolds([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20],5)
Run Code Online (Sandbox Code Playgroud)