5天仍然没有答案
我一直在努力解决这个问题,任何帮助都会非常感激.我正在尝试编写一个运行几个逐步回归的函数,并将所有这些函数输出到列表中.但是,R在读取我在函数参数中指定的数据集时遇到问题.我在各种电路板上发现了几个类似的错误(这里,这里和这里),但是它们似乎都没有得到解决.这一切都归结为在用户定义的函数中调用step()的一些奇怪问题.我使用以下脚本来测试我的代码.多次运行整个过程,直到出现错误(相信我,它会):
test.df <- data.frame(a = sample(0:1, 100, rep = T),
b = as.factor(sample(0:5, 100, rep = T)),
c = runif(100, 0, 100),
d = rnorm(100, 50, 50))
test.df$b[10:100] <- test.df$a[10:100] #making sure that at least one of the variables has some predictive power
stepModel <- function(modeling.formula, dataset, outfile = NULL) {
if (is.null(outfile) == FALSE){
sink(file = outfile,
append = TRUE, type = "output")
print("")
print("Models run at:")
print(Sys.time())
}
model.initial <- glm(modeling.formula,
family = binomial,
data = dataset)
model.stepwise1 <- step(model.initial, direction = "backward")
model.stepwise2 <- step(model.stepwise1, scope = ~.^2)
output <- list(modInitial = model.initial, modStep1 = model.stepwise1, modStep2 = model.stepwise2)
sink()
return(output)
}
blah <- stepModel(a~., dataset = test.df)
Run Code Online (Sandbox Code Playgroud)
这将返回以下错误消息(如果错误没有立即显示,请继续重新运行test.df脚本以及调用stepModel(),它最终将显示):
Error in is.data.frame(data) : object 'dataset' not found
Run Code Online (Sandbox Code Playgroud)
我已经确定一切都运行正常,直到model.stepwise2开始构建.不知何故,临时对象'数据集'在第一步逐步回归中工作正常,但第二步无法识别.我通过评论部分功能找到了这一点,如下所示.此代码运行正常,证明对象'dataset'最初被识别:
stepModel1 <- function(modeling.formula, dataset, outfile = NULL) {
if (is.null(outfile) == FALSE){
sink(file = outfile,
append = TRUE, type = "output")
print("")
print("Models run at:")
print(Sys.time())
}
model.initial <- glm(modeling.formula,
family = binomial,
data = dataset)
model.stepwise1 <- step(model.initial, direction = "backward")
# model.stepwise2 <- step(model.stepwise1, scope = ~.^2)
# sink()
# output <- list(modInitial = model.initial, modStep1 = model.stepwise1, modStep2 = model.stepwise2)
return(model.stepwise1)
}
blah1 <- stepModel1(a~., dataset = test.df)
Run Code Online (Sandbox Code Playgroud)
编辑 - 在任何人问之前,所有的summary()函数都在那里,因为完整的函数(我编辑它以便你可以专注于错误)有另一个片段定义了一个文件,你可以输出逐步跟踪.我摆脱了他们
编辑2 - 会话信息
sessionInfo()R版本2.15.1(2012-06-22)平台:x86_64-pc-mingw32/x64(64位)
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] tcltk stats graphics grDevices utils datasets methods base
other attached packages:
[1] sqldf_0.4-6.4 RSQLite.extfuns_0.0.1 RSQLite_0.11.3 chron_2.3-43
[5] gsubfn_0.6-5 proto_0.3-10 DBI_0.2-6 ggplot2_0.9.3.1
[9] caret_5.15-61 reshape2_1.2.2 lattice_0.20-6 foreach_1.4.0
[13] cluster_1.14.2 plyr_1.8
loaded via a namespace (and not attached):
[1] codetools_0.2-8 colorspace_1.2-1 dichromat_2.0-0 digest_0.6.2 grid_2.15.1
[6] gtable_0.1.2 iterators_1.0.6 labeling_0.1 MASS_7.3-18 munsell_0.4
[11] RColorBrewer_1.0-5 scales_0.2.3 stringr_0.6.2 tools_2.15
Run Code Online (Sandbox Code Playgroud)
编辑3 - 这执行与函数相同的操作,只是不使用函数.即使算法没有收敛,每次运行也都会正常运行:
modeling.formula <- a~.
dataset <- test.df
outfile <- NULL
if (is.null(outfile) == FALSE){
sink(file = outfile,
append = TRUE, type = "output")
print("")
print("Models run at:")
print(Sys.time())
}
model.initial <- glm(modeling.formula,
family = binomial,
data = dataset)
model.stepwise1 <- step(model.initial, direction = "backward")
model.stepwise2 <- step(model.stepwise1, scope = ~.^2)
output <- list(modInitial = model.initial, modStep1 = model.stepwise1, modStep2 = model.stepwise2)
Run Code Online (Sandbox Code Playgroud)
利用do.call指在调用环境中的数据集对我的作品.有关原始建议,请参阅/sf/answers/536819251/.这是一个有效的版本(sink代码已删除).
stepModel2 <- function(modeling.formula, dataset) {
model.initial <- do.call("glm", list(modeling.formula,
family = "binomial",
data = as.name(dataset)))
model.stepwise1 <- step(model.initial, direction = "backward")
model.stepwise2 <- step(model.stepwise1, scope = ~.^2)
list(modInitial = model.initial, modStep1 = model.stepwise1, modStep2 = model.stepwise2)
}
blah <- stepModel2(a~., dataset = "test.df")
Run Code Online (Sandbox Code Playgroud)
它set.seed(6)与原始代码一致地失败了.它失败的原因是函数中dataset不存在变量step,虽然它不需要在制作中model.stepwise1,但是model.stepwise2在model.stepwise1保持线性项时需要它.因此,当您的版本失败时就是这种情况.像我一样在全局环境中调用数据集解决了这个问题.