library(dplyr)
library(ggplot2)
library(magrittr)
diamonds %>%
group_by(cut) %>%
summarise(price_avg = t.test(
. %>% filter(color == "E") %$% price,
. %>% filter(color == "I") %$% price )$p.value)
Run Code Online (Sandbox Code Playgroud)
我正在尝试获取 t.test 的结果以按组申请。在此示例中,查找相同切工时颜色价格是否存在显着差异。我得到的结果是:
Error in summarise_impl(.data, dots) :
Evaluation error: is.atomic(x) is not TRUE.
Run Code Online (Sandbox Code Playgroud)
library(tidyverse)
library(magrittr)
diamonds %>%
group_by(cut) %>%
summarise(price_avg = t.test(price[color=="E"], price[color=="I"])$p.value)
# # A tibble: 5 x 2
# cut price_avg
# <ord> <dbl>
# 1 Fair 3.90e- 3
# 2 Good 1.46e-12
# 3 Very Good 2.44e-39
# 4 Premium 7.27e-52
# 5 Ideal 7.63e-62
Run Code Online (Sandbox Code Playgroud)
您的解决方案的问题是不会.获得数据集的子集(基于您的分组),而是整个数据集。通过执行以下操作进行检查:
diamonds %>%
group_by(cut) %>%
summarise(d = list(.))
# # A tibble: 5 x 2
# cut d
# <ord> <list>
# 1 Fair <tibble [53,940 x 10]>
# 2 Good <tibble [53,940 x 10]>
# 3 Very Good <tibble [53,940 x 10]>
# 4 Premium <tibble [53,940 x 10]>
# 5 Ideal <tibble [53,940 x 10]>
Run Code Online (Sandbox Code Playgroud)
另一种解决方案是:
diamonds %>%
nest(-cut) %>%
mutate(price_avg = map_dbl(data, ~t.test(
.x %>% filter(color == "E") %$% price,
.x %>% filter(color == "I") %$% price )$p.value))
# # A tibble: 5 x 3
# cut data price_avg
# <ord> <list> <dbl>
# 1 Ideal <tibble [21,551 x 9]> 7.63e-62
# 2 Premium <tibble [13,791 x 9]> 7.27e-52
# 3 Good <tibble [4,906 x 9]> 1.46e-12
# 4 Very Good <tibble [12,082 x 9]> 2.44e-39
# 5 Fair <tibble [1,610 x 9]> 3.90e- 3
Run Code Online (Sandbox Code Playgroud)
这是可行的,因为您每次都filter可以传递到filter数据的适当子集(即 column )。data
必须有更好的方法来做到这一点。I\xe2\x80\x99d 可能会采用 Antonios\xe2\x80\x99 方法,但 I\xe2\x80\x99m 试图不使用filter,而是将不同颜色的价格分散到列表列中。不幸的是,我能想出的最好的代码甚至更长:
diamonds %>%\n group_by(cut, color) %>%\n summarize(price = list(price)) %>%\n spread(color, price) %>%\n nest() %>%\n mutate(price_avg = map_dbl(data, ~ t.test(.x$E[[1L]], .x$I[[1L]])$p.value))\nRun Code Online (Sandbox Code Playgroud)\n\n这里的想法是获取两个列表列,I和E,表示相应颜色的钻石的价格。我们现在可以对这两列运行 t 检验(但不幸的是,我们需要取消列出它们才能正常工作)。
I\xe2\x80\x99m 主要把它放在这里作为对话的开始。显然,这不是你想要编写的\xe2\x80\x99t代码,但我相信应该有一个简短的、合乎逻辑的方式来表达这个逻辑(要么这已经是可能的,并且我\xe2\x80\ x99m 俯瞰它,或者需要增强整洁的数据 API)。
\n\n或者,我们可以使用公式 API t.test:
diamonds %>%\n filter(color %in% c('E', 'I')) %>%\n nest(-cut) %>%\n mutate(price_avg = map_dbl(data, ~ t.test(price ~ color, .x)$p.value))\nRun Code Online (Sandbox Code Playgroud)\n\n为了完整起见,这里 \xe2\x80\x99s 相同使用broom::tidy(这返回的列不仅仅是 p 值):
diamonds %>%\n filter(color %in% c('E', 'I')) %>%\n nest(-cut) %>%\n mutate(test = map(data, ~ tidy(t.test(price ~ color, .x)))) %>%\n unnest(test)\nRun Code Online (Sandbox Code Playgroud)\n\n结果是这样的表:
\n\n cut data estimate estimate1 estimate2 statistic p.value parameter conf.low conf.high method alternative\n <ord> <list> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <fct> <fct>\n1 Fair <tibble [1 \xc3\x97 7]> -1003. 3682. 4685. -2.91 3.90e- 3 327. -1682. -324. Welch Two Sample t-test two.sided\n2 Good <tibble [1 \xc3\x97 7]> -1655. 3424. 5079. -7.19 1.46e-12 827. -2107. -1203. Welch Two Sample t-test two.sided\n3 Very Good <tibble [1 \xc3\x97 7]> -2041. 3215. 5256. -13.4 2.44e-39 1860. -2339. -1743. Welch Two Sample t-test two.sided\n4 Premium <tibble [1 \xc3\x97 7]> -2407. 3539. 5946. -15.5 7.27e-52 2405. -2711. -2103. Welch Two Sample t-test two.sided\n5 Ideal <tibble [1 \xc3\x97 7]> -1854. 2598. 4452. -17.0 7.63e-62 3081. -2069. -1640. Welch Two Sample t-test two.sided\nRun Code Online (Sandbox Code Playgroud)\n