错误: Error in predmat[which, seq(nlami)] = preds : replacement has length zero
上下文:数据用二进制y模拟,但有n
编码器true y
.数据是叠加的n
时间,并且模型已经安装,试图获得true y
.
收到错误
L2
罚款,但不是L1
罚款.更新:错误是针对1.9-8之后的版本.1.9-8不会失败.
library(glmnet)
rm(list=ls())
set.seed(123)
num_obs=4000
n_coders=2
precision=.8
X <- matrix(rnorm(num_obs*20, sd=1), nrow=num_obs)
prob1 <- plogis(X %*% c(2, -2, 1, -1, rep(0, 16))) # yes many zeros, ignore
y_true <- rbinom(num_obs, 1, prob1)
dat <- data.frame(y_true = y_true, X = X)
Run Code Online (Sandbox Code Playgroud)
classify <- function(true_y,precision){
n=length(true_y)
y_coder <- numeric(n)
y_coder[which(true_y==1)] <- rbinom(n=length(which(true_y==1)),
size=1,prob=precision)
y_coder[which(true_y==0)] <- rbinom(n=length(which(true_y==0)),
size=1,prob=(1-precision))
return(y_coder)
}
y_codings <- sapply(rep(precision,n_coders),classify,true_y = dat$y_true)
Run Code Online (Sandbox Code Playgroud)
expanded_data <- do.call(rbind,rep(list(dat),n_coders))
expanded_data$y_codings <- matrix(y_codings, ncol = 1)
Run Code Online (Sandbox Code Playgroud)
由于错误取决于种子,因此需要循环.只有第一个循环会失败,其他两个循环都会完成.
X <- as.matrix(expanded_data[,grep("X",names(expanded_data))])
for (i in 1:1000) cv.glmnet(x = X,y = expanded_data$y_codings,
family="binomial", alpha=0) # will fail
for (i in 1:1000) cv.glmnet(x = X,y = expanded_data$y_codings,
family="binomial", alpha=1) # will not fail
for (i in 1:1000) cv.glmnet(x = X,y = expanded_data$y_true,
family="binomial", alpha=0) # will not fail
Run Code Online (Sandbox Code Playgroud)
有什么想法来自glmnet以及如何避免它?从我的阅读来看cv.glmnet
,这是在cv例程之后并且在里面cvstuff = do.call(fun, list(outlist, lambda, x, y, weights, offset, foldid, type.measure, grouped, keep))
,我不理解它的作用,因此失败,以及如何避免它.
会话(Ubuntu和PC)
R version 3.3.1 (2016-06-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.1 LTS
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
[4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] glmnet_2.0-2 foreach_1.4.3 Matrix_1.2-7.1 devtools_1.12.0
loaded via a namespace (and not attached):
[1] httr_1.2.1 R6_2.2.0 tools_3.3.1 withr_1.0.2 curl_2.1
[6] memoise_1.0.0 codetools_0.2-15 grid_3.3.1 iterators_1.0.8 knitr_1.14
[11] digest_0.6.10 lattice_0.20-34
Run Code Online (Sandbox Code Playgroud)
和
R version 3.3.1 (2016-06-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] glmnet_2.0-2 foreach_1.4.3 Matrix_1.2-7.1 devtools_1.12.0
loaded via a namespace (and not attached):
[1] httr_1.2.1 R6_2.2.0 tools_3.3.1 withr_1.0.2 curl_2.1
[6] memoise_1.0.0 codetools_0.2-15 grid_3.3.1 iterators_1.0.8 digest_0.6.10
[11] lattice_0.20-34
Run Code Online (Sandbox Code Playgroud)
我在glmnet_2.0-5中遇到了同样的错误.这与lambdas在某些情况下如何自动创建有关.解决方案是提供自己的lambdas
例如:
cv.glmnet(x = X,
y = expanded_data$y_codings,
family="binomial",
alpha=0,
lambda=exp(seq(log(0.001), log(5), length.out=100)))
Run Code Online (Sandbox Code Playgroud)
感谢https://github.com/lmweber/glmnet-error-example/blob/master/glmnet_error_example.R
好吧,我刚刚运行了第一个循环,它成功完成了。这是 glmnet 2.0.2 的情况。
这更像是一条评论,但它太大了,无法容纳:当运行像这样依赖于随机数的测试时,您可以随时保存种子。这使您可以跳到测试的中间,而不必每次都返回到开始处。
像这样的东西:
results <- lapply(1:1000, function(x) {
seed <- .Random.seed
res <- try(glmnet(x, y, ...)) # so the code keeps running even if there's an error
attr(res, "seed") <- seed
res
})
Run Code Online (Sandbox Code Playgroud)
现在您可以通过查看结果的类别来检查是否有任何运行失败:
errs <- sapply(results, function(x) inherits(x, "try-error"))
any(errs)
Run Code Online (Sandbox Code Playgroud)
您可以重试那些失败的运行:
firstErr <- which(errs)[1]
.Random.seed <- attr(results[[firstErr]], "seed")
glmnet(x, y, ...) # try failed run again
Run Code Online (Sandbox Code Playgroud)
会议信息:
R version 3.2.2 (2015-08-14)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 8 x64 (build 9200)
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.850
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] glmnetUtils_0.55 RevoUtilsMath_8.0.3 RevoUtils_8.0.3 RevoMods_8.0.3 RevoScaleR_8.0.6
[6] lattice_0.20-33 rpart_4.1-10
loaded via a namespace (and not attached):
[1] Matrix_1.2-2 parallel_3.2.2 codetools_0.2-14 rtvs_1.0.0.0 grid_3.2.2
[6] iterators_1.0.8 foreach_1.4.3 glmnet_2.0-2
Run Code Online (Sandbox Code Playgroud)
(那应该是Windows 10,而不是8;R 3.2.2不知道Win10)