R使用列索引号预测数据帧中每列的glm拟合

Question

R使用列索引号预测数据帧中每列的glm拟合

尝试将BLR模型拟合到数据框中的每一列,然后预测新数据点.有很多列,因此无法按名称识别列,只能列列号.回顾了本网站上几个类似性质的例子,无法弄清楚为什么这不起作用.

df <- data.frame(x1 = runif(1000, -10, 10),
                 x2 = runif(1000, -2, 2),
                 x3 = runif(1000, -5, 5),
                 y = rbinom(1000, size = 1, prob = 0.40))

for (i in 1:length(df)-1)
{
        fit <- glm (y ~ df[,i], data = df, family = binomial, na.action = na.exclude)

        new_pts <- data.frame(seq(min(df[,i], na.rm = TRUE), max(df[,i], na.rm = TRUE), len = 200))
        names(new_pts) <- names(df[, i])

        new_pred <- predict(fit, newdata = new_pts, type = "response")

}

Run Code Online (Sandbox Code Playgroud)

该predict()函数引发警告消息并返回数组1000个元素,而测试数据只有200个元素.

警告消息:警告消息:'newdata'有200行,找到的变量有1000行

Answer 1

Man*_*kel 3

对于重复建模，我使用类似的方法，如下所示。我已经用实现了它data.table，但可以重写它以使用基础data.frame（我猜代码会更冗长）。在这种方法中，我将所有模型存储在一个单独的对象中（下面我提供了两个版本的代码，一个更具解释性的部分，以及一个旨在干净输出的更高级的部分）。

当然，您也可以编写一个循环/函数，每次迭代仅适合一个模型而不存储它们。从我的角度来看，保存模型是个好主意，因为您可能必须研究模型的稳健性等，而不仅仅是预测新值。

提示：另请看看@AndS 的答案。提供一种 tidyverse 方法。我认为，连同这个答案，这对于学习/理解 data.table 和 tidyverse 方法来说无疑是一个很好的并排比较

# i have used some more simple data to show that the output is correct, see the plots
df <- data.frame(x1 = seq(1, 100, 10),
                 x2 = (1:10)^2,
                 y =  seq(1, 20, 2))
library(data.table)
setDT(df)
# prepare the data by melting it
DT = melt(df, measure.vars = paste0("x", 1:2), value.name = "x")
# also i used a more simple model (in this case lm would also do)
# create model for each variable (formerly columns)
models = setnames(DT[, data.table(list(glm(y ~ x))), by = "variable"], "V1", "model")
# create a new set of data to be predicted
# NOTE: this could, of course, also be added to the models data.table
# as new column via `:=list(...)`
new_pts = setnames(DT[, seq(min(x, na.rm = TRUE), max(x, na.rm = TRUE), len = 200), by = variable], "V1", "x")
# add the predicted values
new_pts[, predicted:= predict(models[variable == unlist(.BY), model][[1]], newdata = as.data.frame(x),  type = "response")
        , by = variable]
# plot and check if it makes sense
plot(df$x1, df$y)
lines(new_pts[variable == "x1", .(x, predicted)])
points(df$x2, df$y)
lines(new_pts[variable == "x2", .(x, predicted)])

# also the following version of above code is possible
# that generates only one new objects in the environment
# but maybe looks more complicated at first sight
# not sure if this is the best way to do it
# data.table experts might provide some shortcuts
setDT(df)
DT = melt(df, measure.vars = paste0("x", 1:2), value.name = "x")
DT = data.table(variable = unique(DT$variable), dat = split(DT, DT$variable))
DT[, models:= list(list(glm(y ~ x, data = dat[[1]]))), by = variable]
DT[, new_pts:= list(list(data.frame(x = dat[[1]][
                                                 ,seq(min(x, na.rm = TRUE)
                                                 , max(x, na.rm = TRUE), len = 200)]
                                    )))
       , by = variable]
models[, predicted:= list(list(data.frame(pred = predict(model[[1]]
                                          , newdata = new_pts[[1]]
                                          ,  type = "response")))),
       by = variable]
plot(df$x1, df$y)
lines(models[variable == "x1", .(unlist(new_pts), unlist(predicted))])
points(df$x2, df$y)
lines(models[variable == "x2", .(unlist(new_pts), unlist(predicted))])

Run Code Online (Sandbox Code Playgroud)

归档时间：	7 年，3 月前
查看次数：	943 次
最近记录：	7 年，3 月前