uli*_*a2_ 4 grouping regression r dplyr modelr
我正在按数据集中的组估计回归模型,然后我希望为所有组添加正确的拟合值。
我正在尝试以下操作:
library(dplyr)
library(modelr)
df <- tribble(
~year, ~country, ~value,
2001, "France", 55,
2002, "France", 53,
2003, "France", 31,
2004, "France", 10,
2005, "France", 30,
2006, "France", 37,
2007, "France", 54,
2008, "France", 58,
2009, "France", 50,
2010, "France", 40,
2011, "France", 49,
2001, "USA", 55,
2002, "USA", 53,
2003, "USA", 64,
2004, "USA", 40,
2005, "USA", 30,
2006, "USA", 39,
2007, "USA", 55,
2008, "USA", 53,
2009, "USA", 71,
2010, "USA", 44,
2011, "USA", 40
)
rmod <- df %>%
group_by(country) %>%
do(fitModels = lm("value ~ year", data = .))
df <- df %>%
add_predictions(rmod)
Run Code Online (Sandbox Code Playgroud)
这会引发错误:
Error in UseMethod("predict") :
no applicable method for 'predict' applied to an object of class "c('rowwise_df', 'tbl_df', 'tbl', 'data.frame')"
Run Code Online (Sandbox Code Playgroud)
我想要获取一列包含该国家/地区的每个拟合值或一列包含每个国家/地区的预测。不知何故,add_predictions()当模型在调用后保存为列表时,该功能似乎不起作用do()。
您还可以通过几种其他方法来解决此问题。
\n\n可能是最直接的,但你会失去中间模型:
\n\nrmod <- df %>%\n group_by(country) %>%\n mutate(fit = lm(value ~ year)$fitted.values) %>%\n ungroup\nrmod\n# # A tibble: 22 \xc3\x97 4\n# year country value fit\n# <dbl> <chr> <dbl> <dbl>\n# 1 2001 France 55 38.13636\n# 2 2002 France 53 39.00000\n# 3 2003 France 31 39.86364\n# 4 2004 France 10 40.72727\n# 5 2005 France 30 41.59091\n# 6 2006 France 37 42.45455\n# 7 2007 France 54 43.31818\n# 8 2008 France 58 44.18182\n# 9 2009 France 50 45.04545\n# 10 2010 France 40 45.90909\n# # ... with 12 more rows\nRun Code Online (Sandbox Code Playgroud)\n\n另一种方法使用“整洁”模型将数据、模型和结果封装到框架内的各个单元格中:
\n\nrmod <- df %>%\n group_by(country) %>%\n nest() %>%\n mutate(mdl = map(data, ~ lm(value ~ year, data=.))) %>%\n mutate(fit = map(mdl, ~ .$fitted.values))\nrmod\n# # A tibble: 2 \xc3\x97 4\n# country data mdl fit\n# <chr> <list> <list> <list>\n# 1 France <tibble [11 \xc3\x97 2]> <S3: lm> <dbl [11]>\n# 2 USA <tibble [11 \xc3\x97 2]> <S3: lm> <dbl [11]>\nRun Code Online (Sandbox Code Playgroud)\n\n这种方法的优点是,您可以根据需要访问模型的其他属性,也许summary( filter(rmod, country == "France")$mdl[[1]] )。([[1]]是必需的,因为对于tibbles,$mdl将始终返回 a list。)
您可以按如下方式提取/取消嵌套它:
\n\nselect(rmod, -mdl) %>% unnest()\n# # A tibble: 22 \xc3\x97 4\n# country fit year value\n# <chr> <dbl> <dbl> <dbl>\n# 1 France 38.13636 2001 55\n# 2 France 39.00000 2002 53\n# 3 France 39.86364 2003 31\n# 4 France 40.72727 2004 10\n# 5 France 41.59091 2005 30\n# 6 France 42.45455 2006 37\n# 7 France 43.31818 2007 54\n# 8 France 44.18182 2008 58\n# 9 France 45.04545 2009 50\n# 10 France 45.90909 2010 40\n# # ... with 12 more rows\nRun Code Online (Sandbox Code Playgroud)\n\n(不幸的是,这些列被重新排序,但这很美观并且很容易修复。)
\n\n编辑
\n\n如果您想/需要在此处使用modelr-species,请尝试:
rmod <- df %>%\n group_by(country) %>%\n nest() %>%\n mutate(mdl = map(data, ~ lm(value ~ year, data=.))) %>%\n mutate(fit = map(mdl, ~ .$fitted.values)) %>%\n mutate(data = map2(data, mdl, add_predictions))\nrmod\n# # A tibble: 2 x 4\n# country data mdl fit \n# <chr> <list> <list> <list> \n# 1 France <tibble [11 x 3]> <S3: lm> <dbl [11]>\n# 2 USA <tibble [11 x 3]> <S3: lm> <dbl [11]>\nselect(rmod, -mdl, -fit) %>% unnest()\n# # A tibble: 22 x 4\n# country year value pred\n# <chr> <dbl> <dbl> <dbl>\n# 1 France 2001. 55. 38.1\n# 2 France 2002. 53. 39.0\n# 3 France 2003. 31. 39.9\n# 4 France 2004. 10. 40.7\n# 5 France 2005. 30. 41.6\n# 6 France 2006. 37. 42.5\n# 7 France 2007. 54. 43.3\n# 8 France 2008. 58. 44.2\n# 9 France 2009. 50. 45.0\n# 10 France 2010. 40. 45.9\n# # ... with 12 more rows\nRun Code Online (Sandbox Code Playgroud)\n
| 归档时间: |
|
| 查看次数: |
3063 次 |
| 最近记录: |