R:预测组的新值

Med*_*ist 3 r predict tidyverse

我为数据框中的每个组计算了不同的回归:

DF.L <- DF %>%
group_by(Channel) %>%
do(Fit = rlm(L ~ -1 + Y + I(Y^2), data = .))
Run Code Online (Sandbox Code Playgroud)

我想将这套回归应用于另一个数据框。为此,我正在测试如何将其应用于相同的数据框:

DF %>%
group_by(Channel) %>%
do({
    Lfit <- predict(subset(DF.L, Channel == unique(.$Channel))$Fit, .)
    data.frame(., Lfit)
})
glimpse(DF)
Run Code Online (Sandbox Code Playgroud)

但我不断收到此错误:

Error in UseMethod("predict") : 
  no applicable method for 'predict' applied to an object of class "list"
Calls: %>% ... do_.grouped_df -> eval -> eval -> predict -> predict
Run Code Online (Sandbox Code Playgroud)

我做错了什么?

Bri*_*ian 5

使用内置ChickWeight数据:

library(dplyr)
library(MASS)
library(broom)
library(tidyr)
library(ggplot2)


head(ChickWeight)
Run Code Online (Sandbox Code Playgroud)
  weight Time Chick Diet
1     42    0     1    1
2     51    2     1    1
3     59    4     1    1
4     64    6     1    1
5     76    8     1    1
6     93   10     1    1
Run Code Online (Sandbox Code Playgroud)

适合一些模特

ChickWeight_models <- ChickWeight %>% 
  group_by(Diet) %>% 
  do(fit = MASS::rlm(weight ~ Time + I(Time^2), data = .))

ChickWeight_models
Run Code Online (Sandbox Code Playgroud)
Source: local data frame [4 x 2]
Groups: <by row>

# A tibble: 4 x 2
    Diet       fit
* <fctr>    <list>
1      1 <S3: rlm>
2      2 <S3: rlm>
3      3 <S3: rlm>
4      4 <S3: rlm>
Run Code Online (Sandbox Code Playgroud)

因此,我创建了一个与DF.L非常相似的对象。这是一个由四组组成的框架,每组都有一个rlm名为fit的列表列中的对象。


组成一些测试数据

现在,我将组成一些数据来测试该模型。在这种情况下,我将仅获取原始数据,并为每个变量添加一些噪声。

ChickWeight_simulated <- ChickWeight %>% 
  mutate(Time = Time + runif(length(Time)),
         weight = weight + rnorm(length(weight)))

ChickWeight_simulated 
Run Code Online (Sandbox Code Playgroud)
    weight       Time Chick Diet
1 42.72075  0.9786272     1    1
2 51.12669  2.8399631     1    1
3 58.64632  4.4576380     1    1
4 63.77617  6.1083591     1    1
5 75.40434  8.1051792     1    1
6 91.75830 10.7899030     1    1
Run Code Online (Sandbox Code Playgroud)

现在,我们希望将模型的数据框与新数据结合起来进行测试。首先我们group_bytidyr::nest模拟数据。这将创建一个对象,该对象是具有四个组的数据框和一个名为data的列表列,其每个元素都包含一个汇总的数据框。

ChickWeight_simulated %>% group_by(Diet) %>% nest()
Run Code Online (Sandbox Code Playgroud)
# A tibble: 4 x 2
    Diet               data
  <fctr>             <list>
1      1 <tibble [220 x 3]>
2      2 <tibble [120 x 3]>
3      3 <tibble [120 x 3]>
4      4 <tibble [118 x 3]>
Run Code Online (Sandbox Code Playgroud)

将原始模型添加到新数据

然后我们可以将其加入模型数据框:

ChickWeight_simulated %>% group_by(Diet) %>% nest() %>% 
  full_join(ChickWeight_models)
Run Code Online (Sandbox Code Playgroud)
# A tibble: 4 x 3
    Diet               data       fit
  <fctr>             <list>    <list>
1      1 <tibble [220 x 3]> <S3: rlm>
2      2 <tibble [120 x 3]> <S3: rlm>
3      3 <tibble [120 x 3]> <S3: rlm>
4      4 <tibble [118 x 3]> <S3: rlm>
Run Code Online (Sandbox Code Playgroud)

现在,我们再次按Diet分组,并broom::augment根据新的模拟数据对每个模型进行预测。由于每一组都是一行,因此拟合数据各有一个元素;我们必须使用来将每个列表列中的单个元素提取为可用形式[[1]]

ChickWeight_simulated_predicted <-
ChickWeight_simulated %>% group_by(Diet) %>% nest() %>% 
  full_join(ChickWeight_models) %>% 
  group_by(Diet) %>% 
  do(augment(.$fit[[1]], newdata = .$data[[1]])) 

head(ChickWeight_simulated_predicted)
Run Code Online (Sandbox Code Playgroud)
# A tibble: 6 x 6
# Groups:   Diet [1]
    Diet   weight       Time Chick  .fitted  .se.fit
  <fctr>    <dbl>      <dbl> <ord>    <dbl>    <dbl>
1      1 42.72075  0.9786272     1 43.62963 2.368838
2      1 51.12669  2.8399631     1 51.80855 1.758385
3      1 58.64632  4.4576380     1 59.67606 1.534051
4      1 63.77617  6.1083591     1 68.43218 1.534152
5      1 75.40434  8.1051792     1 80.00678 1.647612
6      1 91.75830 10.7899030     1 97.26450 1.726331
Run Code Online (Sandbox Code Playgroud)

完整性检查

为了证明这真的仅仅从一个特定级别中使用的模型饮食上从该级别的模拟数据的饮食,我们可以直观的模型拟合。

ChickWeight_simulated_predicted %>% 
  ggplot(aes(Time, weight)) + 
  geom_point(shape = 1) + 
  geom_ribbon(aes(Time, 
                  ymin = .fitted-1.96*.se.fit, 
                  ymax = .fitted+1.96*.se.fit),
              alpha = 0.5, fill = "black") +
  geom_line(aes(Time, .fitted), size = 1, color = "red") +
  facet_wrap(~Diet)
Run Code Online (Sandbox Code Playgroud)

在此处输入图片说明