C. *_*Rea 8 r dplyr broom tidymodels
这个问题的答案清楚地解释了在通过 dplyr 管道运行回归时如何按组检索整洁的回归结果,但该解决方案不再可重现。
如何组合使用 dplyr 和 broom 来按组运行回归并使用 R 4.02、dplyr 1.0.0 和 broom 0.7.0 检索整洁的结果?
具体来说,上面链接的问题的示例答案,
library(dplyr)
library(broom)
df.h = data.frame(
hour = factor(rep(1:24, each = 21)),
price = runif(504, min = -10, max = 125),
wind = runif(504, min = 0, max = 2500),
temp = runif(504, min = - 10, max = 25)
)
dfHour = df.h %>% group_by(hour) %>%
do(fitHour = lm(price ~ wind + temp, data = .))
# get the coefficients by group in a tidy data_frame
dfHourCoef = tidy(dfHour, fitHour)
Run Code Online (Sandbox Code Playgroud)
当我在我的系统上运行它时返回以下错误(和三个警告):
Error in var(if (is.vector(x) || is.factor(x)) x else as.double(x), na.rm = na.rm) :
Calling var(x) on a factor x is defunct.
Use something like 'all(duplicated(x)[-1L])' to test for a constant vector.
In addition: Warning messages:
1: Data frame tidiers are deprecated and will be removed in an upcoming release of broom.
2: In mean.default(X[[i]], ...) :
argument is not numeric or logical: returning NA
3: In mean.default(X[[i]], ...) :
argument is not numeric or logical: returning NA
Run Code Online (Sandbox Code Playgroud)
如果我重新格式化df.h$hour为字符而不是因子,
df.h <- df.h %>%
mutate(
hour = as.character(hour)
)
Run Code Online (Sandbox Code Playgroud)
按组重新运行回归,并再次尝试使用检索结果broom::tidy,
dfHour = df.h %>% group_by(hour) %>%
do(fitHour = lm(price ~ wind + temp, data = .))
# get the coefficients by group in a tidy data_frame
dfHourCoef = tidy(dfHour, fitHour)
Run Code Online (Sandbox Code Playgroud)
我收到此错误:
Error in var(if (is.vector(x) || is.factor(x)) x else as.double(x), na.rm = na.rm) :
is.atomic(x) is not TRUE
Run Code Online (Sandbox Code Playgroud)
我认为该问题与组级回归结果作为列表存储dfHour$fitHour在最初发布的代码/答案。
****** 更新了从 dplyr 1.0.0 发行说明中提取的更简洁的代码 ******
谢谢你。我正在努力解决与使用提供的链接中的示例相关的 dplyr 1.0.0 更新的类似问题。这是一个有用的问题和答案。
作为仅供参考,do() 已被 dplyr 1.0.0 取代,因此可以考虑使用更新的语言(现在我的更新非常有效):
dfHour = df.h %>%
# replace group_by() with nest_by()
# to convert your model data to a vector of lists
nest_by(hour) %>%
# change do() to mutate(), then add list() before your model
# make sure to change data = . to data = data
mutate(fitHour = list(lm(price ~ wind + temp, data = data))) %>%
summarise(tidy(mod))
Run Code Online (Sandbox Code Playgroud)
完毕!
这提供了一个非常有效的 df 选择输出统计数据。最后一行替换了以下代码(来自我的原始响应),它执行相同的操作,但不太容易:
ungroup() %>%
# then leverage the feedback from @akrun
transmute(hour, HourCoef = map(fitHour, tidy)) %>%
unnest(HourCoef)
dfHour
Run Code Online (Sandbox Code Playgroud)
这给出了输出:
# A tibble: 72 x 6
hour term estimate std.error statistic p.value
<fct> <chr> <dbl> <dbl> <dbl> <dbl>
1 1 (Intercept) 68.6 21.0 3.27 0.00428
2 1 wind 0.000558 0.0124 0.0450 0.965
3 1 temp -0.866 0.907 -0.954 0.353
4 2 (Intercept) 31.9 17.4 1.83 0.0832
5 2 wind 0.00950 0.0113 0.838 0.413
6 2 temp 1.69 0.802 2.11 0.0490
7 3 (Intercept) 85.5 22.3 3.83 0.00122
8 3 wind -0.0210 0.0165 -1.27 0.220
9 3 temp 0.276 1.14 0.243 0.811
10 4 (Intercept) 73.3 15.1 4.86 0.000126
# ... with 62 more rows
Run Code Online (Sandbox Code Playgroud)
感谢您的耐心,我自己正在解决这个问题!
问题是rowwise调用后有一个分组属性do,并且“fitHour”列是一个list. 我们可以将with和itungroup循环到一个列listmaptidylist
library(dplyr)\nlibrary(purrr)\nlibrary(broom)\ndf.h %>% \n group_by(hour) %>%\n do(fitHour = lm(price ~ wind + temp, data = .)) %>% \n ungroup %>% \n mutate(HourCoef = map(fitHour, tidy))\nRun Code Online (Sandbox Code Playgroud)\nunnest或者在之后使用mtuate
df.h %>% \n group_by(hour) %>%\n do(fitHour = lm(price ~ wind + temp, data = .)) %>% \n ungroup %>% \n transmute(hour, HourCoef = map(fitHour, tidy)) %>% \n unnest(HourCoef)\n# A tibble: 72 x 6\n# hour term estimate std.error statistic p.value\n# <fct> <chr> <dbl> <dbl> <dbl> <dbl>\n# 1 1 (Intercept) 89.8 20.2 4.45 0.000308\n# 2 1 wind 0.00493 0.0151 0.326 0.748 \n# 3 1 temp -1.84 1.08 -1.71 0.105 \n# 4 2 (Intercept) 75.6 23.7 3.20 0.00500 \n# 5 2 wind -0.00910 0.0146 -0.622 0.542 \n# 6 2 temp 0.192 0.853 0.225 0.824 \n# 7 3 (Intercept) 44.0 23.9 1.84 0.0822 \n# 8 3 wind -0.00158 0.0166 -0.0953 0.925 \n# 9 3 temp 0.622 1.19 0.520 0.609 \n#10 4 (Intercept) 57.8 18.9 3.06 0.00676 \n# \xe2\x80\xa6 with 62 more rows\nRun Code Online (Sandbox Code Playgroud)\n如果我们想要单个数据集pull“fitHour”,请循环使用 with list,通过行绑定(后缀)map将其压缩为单个数据集_dfr
df.h %>%\n group_by(hour) %>% \n do(fitHour = lm(price ~ wind + temp, data = .)) %>% \n ungroup %>% \n pull(fitHour) %>% \n map_dfr(tidy, .id = 'grp')\nRun Code Online (Sandbox Code Playgroud)\n注意:OP的错误消息可以用R 4.02,dplyr 1.0.0和复制broom 0.7.0
tidy(dfHour,fitHour)\nRun Code Online (Sandbox Code Playgroud)\n\n\nvar(if (is.vector(x) || is.factor(x)) x else as.double(x),\nna.rm = na.rm) 中的错误:\n对因子 x 调用 var(x)已失效。\n使用类似“all(duplicated(x)[-1L])”的内容来测试常量向量。\n此外:警告消息:\n1:数据框 tidiers 已弃用,并将在即将发布的版本中删除扫帚。\n2:在mean.default(X[[i]], ...) 中:
\n