ggplot2 stat_compare_means 和 wilcox.test 中的不同 p 值

And*_* H. 5 r ggplot2 p-value

我尝试将 p 值添加到我ggplot使用的stat_compare_means函数中。然而,我在 ggplot 中得到的 p 值与基本 wilcox.test 的结果不同。

\n\n

我在这两种情况下都使用了配对测试,并且还在 ggplot 中使用了 wilcoxon 测试。

\n\n

我尝试搜索我的问题,但找不到确切的答案。\n我更新了 R(v. 3.5.2)、R-Studio(v. 1.1.463)和所有软件包。下面我添加了几行代码和示例。我对 R 和统计数据很陌生,所以如果我以新手的方式提问,请原谅我。

\n\n
library("ggplot2")  \nlibrary("ggpubr")\n\n\nc1 <- c( 798.3686, 2560.9974,  688.3051,  669.8265, 2750.6638, 1136.3535,  \n         1335.5696, 2347.2777, 1149.1940,  901.6880, 1569.0731 ,3915.6719,  \n         3972.0250 ,5517.5016, 4616.6393, 3232.0120, 4020.9727, 2249.4150,  \n         2226.4108, 2582.3705, 1653.4801, 3162.2784, 3199.1923, 4792.6118)  \nc2 <- c(0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1)  \n\ntest <-data.frame(c2,c1)  \n\ntest$c2 <- as.factor(test$c2)  \n\nggplot(test, aes(x=c2, y=c1)) +  \n  stat_compare_means(paired = TRUE)  \n\nwilcox.test( test$c1~ test$c2, paired= TRUE)  \n
Run Code Online (Sandbox Code Playgroud)\n\n

ggplot 中 stat_compare_means 的结果\nggplot 中 stat_compare_means 的结果

\n\n

Wilcoxon 符号秩检验的结果:

\n\n
\n

数据: test$c1 by test$c2
\n V = 0,p 值 = 0.0004883
\n 备择假设:真实位置偏移不等于 0

\n
\n\n

如您所见,ggplot 中的结果为 p = 0.0025,使用基本 wilcox.test 函数的结果为 p= 0.0004883。你知道为什么不一样吗?哪一个值是正确的?

\n\n

PS:我尝试对ToothGrowths 做同样的事情。stat_compare_means在这种情况下,和 的结果wilcox.test显示相同的结果:p = 0.004313。我不知道为什么它不能使用我的数据\xe2\x80\x99:/

\n

dip*_*kov 3

更新2022/03/16

\n

tidyverse 已经发展,这个解决方案也应该如此。

\n

在一种情况下,p 值是精确的,而在另一种情况下,它是正态近似值。

\n
wilcox.test( test$c1~ test$c2, paired = TRUE, exact = TRUE)\n# Wilcoxon signed rank test\n# \n# data:  test$c1 by test$c2\n# V = 0, p-value = 0.0004883\n# alternative hypothesis: true location shift is not equal to 0\n\nwilcox.test( test$c1~ test$c2, paired = TRUE, exact = FALSE)\n# Wilcoxon signed rank test with continuity correction\n# \n# data:  test$c1 by test$c2\n# V = 0, p-value = 0.002526\n# alternative hypothesis: true location shift is not equal to 0\n
Run Code Online (Sandbox Code Playgroud)\n

根据help(wilcox.test), if the samples contain less than 50 values (as in your case), the exact p-value is computed (unless you specify otherwise).

\n

stat_compare_meansmethod.args争论但似乎没有通过exact = TRUE specification correctly. Instead you can compute the p-value exactly how you want it first and then add it to the plot:

\n
exact_pvalue <-\n  wilcox.test( test$c1~ test$c2, paired = TRUE, exact = TRUE) %>%\n  # Format the test output as a tibble\n  broom::tidy() %>%\n  # Format the p-value\n  mutate(pval_fmt = format.pval(p.value, digits = 2)) %>%\n  # Specify position in (c1, c2) coordinates\n  mutate(c1 = 5518, c2 = 0)\nexact_pvalue\n# A tibble: 1 x 7\n#  statistic  p.value method                    alternative pval_fmt    c1    c2\n#      <dbl>    <dbl> <chr>                     <chr>       <chr>    <dbl> <dbl>\n#1         0 0.000488 Wilcoxon signed rank test two.sided   0.00049   5518     0\n\nggplot(test, aes(x=c2, y=c1)) +\n  geom_text(aes(label = glue::glue("Wilcoxon p = {pval_fmt}")), \n            data = exact_pvalue)\n
Run Code Online (Sandbox Code Playgroud)\n

您可以轻松地推广此方法以同时执行多个测试并在最后创建多面图。

\n
wilcox.test( test$c1~ test$c2, paired = TRUE, exact = TRUE)\n# Wilcoxon signed rank test\n# \n# data:  test$c1 by test$c2\n# V = 0, p-value = 0.0004883\n# alternative hypothesis: true location shift is not equal to 0\n\nwilcox.test( test$c1~ test$c2, paired = TRUE, exact = FALSE)\n# Wilcoxon signed rank test with continuity correction\n# \n# data:  test$c1 by test$c2\n# V = 0, p-value = 0.002526\n# alternative hypothesis: true location shift is not equal to 0\n
Run Code Online (Sandbox Code Playgroud)\n

\n

由reprex 包于 2022 年 3 月 16 日创建(v2.0.1)

\n

旧的解决方案

\n
library("tidyverse")\n\ntest2 <-\n  # Fake data with two subsets to run to test on (in this case the p-value\n  # will be the same because the subsets actually contain the same data).\n  bind_rows(test, test, .id = "subset") %>%\n  # Group by subset and nest the data columns. This creates a "list of\n  # tibbles" column called "data".\n  group_by(subset) %>%\n  nest() %>%\n  # Use `purrr::map` to perform the test on each group.\n  mutate(wilcox = map(data, ~ wilcox.test(.x$c1 ~ .x$c2,\n                                          paired = TRUE, exact = TRUE))) %>%\n  # And again `purrr::map` to tidy the test results.\n  # Now we have two list columns, one with the data and the other with \n  # the test results\n  mutate(wilcox = map(wilcox, broom::tidy))\ntest2\n# A tibble: 2 x 3\n# subset data              wilcox\n# <chr>  <list>            <list>\n#   1 1      <tibble [24 x 2]> <tibble [1 x 4]>\n#   2 2      <tibble [24 x 2]> <tibble [1 x 4]>\n\ntest2 %>%\n  unnest(data) %>%\n  ggplot(aes(c1, c2)) +\n  # Plot the raw data\n  geom_point() +\n  # Add the p-value\n  geom_text(data = test2 %>% unnest(wilcox),\n            # Specify the aestetic mapping so that the p-value is\n            # plotted in the top right corner of each plot.\n            aes(x = Inf, y = Inf, label = format.pval(p.value, digits = 2)),\n            inherit.aes = FALSE, hjust = "inward", vjust = "inward") +\n  # Do this for each subset in its own subplot.\n  facet_wrap(~ subset)\n
Run Code Online (Sandbox Code Playgroud)\n