尝试将BLR模型拟合到数据框中的每一列,然后预测新数据点.有很多列,因此无法按名称识别列,只能列列号.回顾了本网站上几个类似性质的例子,无法弄清楚为什么这不起作用.
df <- data.frame(x1 = runif(1000, -10, 10),
x2 = runif(1000, -2, 2),
x3 = runif(1000, -5, 5),
y = rbinom(1000, size = 1, prob = 0.40))
for (i in 1:length(df)-1)
{
fit <- glm (y ~ df[,i], data = df, family = binomial, na.action = na.exclude)
new_pts <- data.frame(seq(min(df[,i], na.rm = TRUE), max(df[,i], na.rm = TRUE), len = 200))
names(new_pts) <- names(df[, i])
new_pred <- predict(fit, newdata = new_pts, type = "response")
}
Run Code Online (Sandbox Code Playgroud)
该predict()函数引发警告消息并返回数组1000个元素,而测试数据只有200个元素.
警告消息:警告消息:'newdata'有200行,找到的变量有1000行
对于重复建模,我使用类似的方法,如下所示。我已经用 实现了它data.table,但可以重写它以使用基础data.frame(我猜代码会更冗长)。在这种方法中,我将所有模型存储在一个单独的对象中(下面我提供了两个版本的代码,一个更具解释性的部分,以及一个旨在干净输出的更高级的部分)。
当然,您也可以编写一个循环/函数,每次迭代仅适合一个模型而不存储它们。从我的角度来看,保存模型是个好主意,因为您可能必须研究模型的稳健性等,而不仅仅是预测新值。
提示:另请看看@AndS 的答案。提供一种 tidyverse 方法。我认为,连同这个答案,这对于学习/理解 data.table 和 tidyverse 方法来说无疑是一个很好的并排比较
# i have used some more simple data to show that the output is correct, see the plots
df <- data.frame(x1 = seq(1, 100, 10),
x2 = (1:10)^2,
y = seq(1, 20, 2))
library(data.table)
setDT(df)
# prepare the data by melting it
DT = melt(df, measure.vars = paste0("x", 1:2), value.name = "x")
# also i used a more simple model (in this case lm would also do)
# create model for each variable (formerly columns)
models = setnames(DT[, data.table(list(glm(y ~ x))), by = "variable"], "V1", "model")
# create a new set of data to be predicted
# NOTE: this could, of course, also be added to the models data.table
# as new column via `:=list(...)`
new_pts = setnames(DT[, seq(min(x, na.rm = TRUE), max(x, na.rm = TRUE), len = 200), by = variable], "V1", "x")
# add the predicted values
new_pts[, predicted:= predict(models[variable == unlist(.BY), model][[1]], newdata = as.data.frame(x), type = "response")
, by = variable]
# plot and check if it makes sense
plot(df$x1, df$y)
lines(new_pts[variable == "x1", .(x, predicted)])
points(df$x2, df$y)
lines(new_pts[variable == "x2", .(x, predicted)])
# also the following version of above code is possible
# that generates only one new objects in the environment
# but maybe looks more complicated at first sight
# not sure if this is the best way to do it
# data.table experts might provide some shortcuts
setDT(df)
DT = melt(df, measure.vars = paste0("x", 1:2), value.name = "x")
DT = data.table(variable = unique(DT$variable), dat = split(DT, DT$variable))
DT[, models:= list(list(glm(y ~ x, data = dat[[1]]))), by = variable]
DT[, new_pts:= list(list(data.frame(x = dat[[1]][
,seq(min(x, na.rm = TRUE)
, max(x, na.rm = TRUE), len = 200)]
)))
, by = variable]
models[, predicted:= list(list(data.frame(pred = predict(model[[1]]
, newdata = new_pts[[1]]
, type = "response")))),
by = variable]
plot(df$x1, df$y)
lines(models[variable == "x1", .(unlist(new_pts), unlist(predicted))])
points(df$x2, df$y)
lines(models[variable == "x2", .(unlist(new_pts), unlist(predicted))])
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
943 次 |
| 最近记录: |