通过`do`平滑每一组

Question

通过`do`平滑每一组

我有一些数据,下面是一个样本.我的目标是对gam每个年份应用a ,并使用另一个值作为gam模型的预测值.

fertility <- structure(list(AGE = c(15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 
23L, 24L, 25L, 26L, 27L, 28L, 29L, 30L, 31L, 32L, 33L, 34L, 35L, 
36L, 37L, 38L, 39L, 40L, 41L, 42L, 43L, 44L, 15L, 16L, 17L, 18L, 
19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, 29L, 30L, 31L, 
32L, 33L, 34L, 35L, 36L, 37L, 38L, 39L, 40L, 41L, 42L, 43L, 44L
), Year = c(1930, 1930, 1930, 1930, 1930, 1930, 1930, 1930, 1930, 
1930, 1930, 1930, 1930, 1930, 1930, 1930, 1930, 1930, 1930, 1930, 
1930, 1930, 1930, 1930, 1930, 1930, 1930, 1930, 1930, 1930, 1931, 
1931, 1931, 1931, 1931, 1931, 1931, 1931, 1931, 1931, 1931, 1931, 
1931, 1931, 1931, 1931, 1931, 1931, 1931, 1931, 1931, 1931, 1931, 
1931, 1931, 1931, 1931, 1931, 1931, 1931), fertility = c(5.170284269, 
14.18135114, 27.69795144, 44.61216712, 59.08896308, 89.66036496, 
105.4563852, 120.1754041, 137.4074262, 148.7159407, 161.5645606, 
157.200515, 143.6340251, 127.8855125, 117.7343628, 159.2909484, 
126.6158821, 109.0681613, 86.98223678, 70.64470361, 111.0070633, 
86.15051988, 68.9204159, 55.92722274, 42.93402958, 56.84376018, 
39.35337243, 26.72142573, 18.46207596, 9.231037978, 4.769704534, 
13.08261815, 25.55198857, 41.15573626, 54.51090896, 81.99522459, 
96.44082973, 109.9015072, 125.6603492, 136.0020892, 148.679958, 
144.6639404, 132.1793638, 117.6867783, 108.345172, 144.2820726, 
114.68575, 98.79142865, 78.7865069, 63.9883456, 100.217918, 77.77726461, 
62.22181169, 50.49147014, 38.76112859, 52.48807067, 36.33789508, 
24.67387938, 17.04740757, 8.523703784)), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -60L), .Names = c("AGE", 
"Year", "fertility"))

Run Code Online (Sandbox Code Playgroud)

因此,非dplyr,"愚蠢"的方式来做到这一点

count <- 0
for (i in 1930:1931){
  count <- count + 1
  temp <- filter(fertility, Year == i)
  mod <- mgcv::gam(fertility ~ s(AGE), data=temp)
  pred[length(15:44) * (count - 1) + 1:30] <- predict(mod, newdata = data.frame(AGE = 15:44))
}

fertility1 <- mutate(fertility, pred = pred)

Run Code Online (Sandbox Code Playgroud)

但我想要一种方法dplyr.我的想法是用来do为每列创建一个模型,然后predict用来获取值.我可以做的第一步,但我正在努力实现第二部分dplyr:

library(mgcv)
library(dplyr)

  fertility %>%
    #filter(!is.na(fertility)) %>%  # not sure if this is necessary
    group_by(Year) %>%
    dplyr::do(model = mgcv::gam(fertility ~ s(AGE), data = .)) %>%
    left_join(fertility, .) %>%
    mutate(smoothed = predict(model, newdata = AGE))

Run Code Online (Sandbox Code Playgroud)

我收到错误消息

Error in UseMethod("predict") : 
  no applicable method for 'predict' applied to an object of class "list"

Run Code Online (Sandbox Code Playgroud)

这可能意味着dplyr不记得那model是一个模型,而不仅仅是一个列表元素.

Answer 1

Rei*_*son 10

该智能的方式来做到这一点是使用因子平稳的相互作用已在已经提供mgcv的年龄,或者通过by在条件s()或通过新的bs = "fs"基础类型.以下是您的数据示例:

library("mgcv")
## Make Year a factor
fertility <- transform(fertility, Year = factor(Year))
## Fit model using by terms - include factor as fixed effect too!
mod <- gam(fertility ~ Year + s(AGE, by = Year), data = fertility)
## Plot to see what form this model takes
plot(mod, pages = 1)

Run Code Online (Sandbox Code Playgroud)

在此输入图像描述

## Some prediction data
ages <- with(fertility, seq(min(AGE), max(AGE)))
## Need to replicate this once per Year
pdat <- with(fertility,
             data.frame(AGE = rep(ages, nlevels(Year)),
                        Year = rep(levels(Year), each = length(ages))))
## Add the fitted values to the prediction data
pdat <- transform(pdat, fitted = predict(mod, newdata = pdat))
head(pdat)

> head(pdat)
  AGE Year     fitted
1  15 1930 -0.8496705
2  16 1930 15.9568574
3  17 1930 33.0754019
4  18 1930 50.7419122
5  19 1930 68.9116594
6  20 1930 87.1306489

Run Code Online (Sandbox Code Playgroud)

但是,如果您想要做的就是预测观察到的值,您可以询问拟合值AGES:

fertility <- transform(fertility, fitted = predict(mod))
head(fertility)

> head(fertility)
  AGE Year fertility     fitted
1  15 1930  5.170284 -0.8496705
2  16 1930 14.181351 15.9568574
3  17 1930 27.697951 33.0754019
4  18 1930 44.612167 50.7419122
5  19 1930 59.088963 68.9116594
6  20 1930 89.660365 87.1306489

Run Code Online (Sandbox Code Playgroud)

您还可以看看具体的因子平稳的基础型bs = "fs"和?smooth.terms和?factor.smooth.interaction对细节; 基本上这些是有效的,如果你有很多级别,但你希望每个级别的平滑器具有相同的平滑参数值.

这里的主要优点是,你使用的所有数据和适应一个单一的模式,然后你就可以在多种方式查询不会轻易向你敞开,如果你适合中号不同的模型,如能调查每平整器的差异年.

归档时间：	10 年，6 月前
查看次数：	388 次
最近记录：	10 年，6 月前