使用带有lm()对象列表的predict

JD *_*ong 18 r predict plyr lm

我有定期运行回归的数据.每个"数据块"的数据都适合不同的回归.例如,每个州可能具有解释从属值的不同功能.这似乎是典型的"拆分 - 应用 - 组合"类型的问题,因此我使用的是plyr包.我可以轻松创建一个lm()运行良好的对象列表.但是,我不能完全理解我以后如何使用这些对象来预测单独data.frame中的值.

这是一个完全人为的例子,说明了我正在尝试做的事情:

# setting up some fake data
set.seed(1)
funct <- function(myState, myYear){
   rnorm(1, 100, 500) +  myState + (100 * myYear) 
}
state <- 50:60
year <- 10:40
myData <- expand.grid( year, state)
names(myData) <- c("year","state")
myData$value <- apply(myData, 1, function(x) funct(x[2], x[1]))
## ok, done with the fake data generation. 

require(plyr)

modelList <- dlply(myData, "state", function(x) lm(value ~ year, data=x))
## if you want to see the summaries of the lm() do this:  
    # lapply(modelList, summary)

state <- 50:60
year <- 50:60
newData <- expand.grid( year, state)
names(newData) <- c("year","state") 
## now how do I predict the values for newData$value 
   # using the regressions in modelList? 
Run Code Online (Sandbox Code Playgroud)

那么如何使用其中lm()包含的对象modelList来使用年份和状态独立值来预测值newData

Jos*_*ich 9

这是我的尝试:

predNaughty <- ddply(newData, "state", transform,
  value=predict(modelList[[paste(piece$state[1])]], newdata=piece))
head(predNaughty)
#   year state    value
# 1   50    50 5176.326
# 2   51    50 5274.907
# 3   52    50 5373.487
# 4   53    50 5472.068
# 5   54    50 5570.649
# 6   55    50 5669.229
predDiggsApproved <- ddply(newData, "state", function(x)
  transform(x, value=predict(modelList[[paste(x$state[1])]], newdata=x)))
head(predDiggsApproved)
#   year state    value
# 1   50    50 5176.326
# 2   51    50 5274.907
# 3   52    50 5373.487
# 4   53    50 5472.068
# 5   54    50 5570.649
# 6   55    50 5669.229
Run Code Online (Sandbox Code Playgroud)

JD Long编辑

我的灵感足以找出一个adply()选项:

pred3 <- adply(newData, 1,  function(x)
    predict(modelList[[paste(x$state)]], newdata=x))
head(pred3)
#   year state        1
# 1   50    50 5176.326
# 2   51    50 5274.907
# 3   52    50 5373.487
# 4   53    50 5472.068
# 5   54    50 5570.649
# 6   55    50 5669.229
Run Code Online (Sandbox Code Playgroud)


Ram*_*ath 6

只有baseR的解决方案.输出的格式不同,但所有值都在那里.

models <- lapply(split(myData, myData$state), 'lm', formula = value ~ year)
pred4  <- mapply('predict', models, split(newData, newData$state))
Run Code Online (Sandbox Code Playgroud)


had*_*ley 6

您需要使用mdply为每个函数调用提供模型和数据:

dataList <- dlply(newData, "state")

preds <- mdply(cbind(mod = modelList, df = dataList), function(mod, df) {
  mutate(df, pred = predict(mod, newdata = df))
})
Run Code Online (Sandbox Code Playgroud)