我运行了 20 倍cv.glmnet套索模型以获得 lambda 的“最佳”值。但是,当我尝试重现 的结果时glmnet(),我收到一条错误消息:
Warning messages:
1: from glmnet Fortran code (error code -1); Convergence for 1th lambda
value not reached after maxit=100000 iterations; solutions for larger
lambdas returned
2: In getcoef(fit, nvars, nx, vnames) :
an empty model has been returned; probably a convergence issue
Run Code Online (Sandbox Code Playgroud)
我的代码是这样写的:
set.seed(5)
cv.out <- cv.glmnet(x[train,],y[train],family="binomial",nfolds=20,alpha=1,parallel=TRUE)
coef(cv.out)
bestlam <- cv.out$lambda.min
lasso.mod.best <- glmnet(x[train,],y[train],alpha=1,family="binomial",lambda=bestlam)
Run Code Online (Sandbox Code Playgroud)
现在,上面的值bestlam是2.976023e-05如此,也许这就是导致问题的原因?这是 lambda 值的舍入问题吗?我无法直接从函数重现结果是否有原因glmnet()?如果我使用与 值范围相似的 lambda 值向量bestlam,则不会有任何问题。
我在R中发现了这个怪癖,并且无法找到它出现的原因.我试图重新创建一个样本作为检查,并发现该sample函数在某些情况下表现不同.看这个例子:
# Look at the first ten rows of a randomly ordered vector of the first 10 million integers
set.seed(4)
head(sample(1:10000000), 10)
[1] 5858004 89458 2937396 2773749 8135739 2604277 7244055 9060916 9490395 731445
# Select a specified sample of size 10 from this same list
set.seed(4)
sample(1:10000000), size = 10)
[1] 5858004 89458 2937396 2773749 8135739 2604277 7244055 9060916 9490395 731445
# Try the same for sample size 10,000,001
set.seed(4)
head(sample(1:10000001), 10)
[1] 5858004 89458 2937396 2773750 8135740 …Run Code Online (Sandbox Code Playgroud)