我有以下数据框:
library(tidyverse)
dat <- structure(list(charge.Group3 = c(0.167, 0.167, 0.1, 0.067, 0.033,
0.033, 0.067, 0.133, 0.2, 0.067, 0.133, 0.114, 0.167, 0.033,
0.1, 0.033, 0.133, 0.267, 0.133, 0.233, 0.1, 0.167, 0.067, 0.133,
0.1, 0.133, 0.1, 0.133, 0.1, 0.067, 0.167, 0), hydrophobicity.Group3 = c(0.267,
0.467, 0.067, 0.167, 0.267, 0.1, 0.367, 0.233, 0.367, 0.233,
0.133, 0.205, 0.333, 0.267, 0.267, 0.067, 0.133, 0.3, 0.233,
0.267, 0.5, 0.333, 0.2, 0.5, 0.5, 0.4, 0.033, 0.3, 0.233, 0.5,
0.233, 0.033), class = c("Negative", "Negative", "Positive",
"Positive", "Positive", "Positive", "Positive", "Negative", "Positive",
"Positive", "Positive", "Positive", "Positive", "Positive", "Negative",
"Positive", "Negative", "Negative", "Negative", "Negative", "Negative",
"Negative", "Negative", "Negative", "Negative", "Negative", "Positive",
"Positive", "Positive", "Negative", "Positive", "Negative")), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -32L))
dat
#> # A tibble: 32 x 3
#> charge.Group3 hydrophobicity.Group3 class
#> <dbl> <dbl> <chr>
#> 1 0.167 0.267 Negative
#> 2 0.167 0.467 Negative
#> 3 0.1 0.067 Positive
#> 4 0.067 0.167 Positive
#> 5 0.033 0.267 Positive
#> 6 0.033 0.1 Positive
#> 7 0.067 0.367 Positive
#> 8 0.133 0.233 Negative
#> 9 0.2 0.367 Positive
#> 10 0.067 0.233 Positive
#> # ... with 22 more rows
Run Code Online (Sandbox Code Playgroud)
我想为每个功能做什么:charge.Group3和hydrophobicity.Group3,wilcox.test在负类和正类之间执行。最后得到 p 值作为数据框或小标题:
features pvalue
charge.Group3 0.1088
hydrophobicity.Group3 0.03895
# I do by hand.
Run Code Online (Sandbox Code Playgroud)
请注意,实际上有 2 个以上的功能。我怎样才能做到这一点?
broom如果您只需要测试的 p 值,则您实际上不需要使用。
library(tidyverse)
dat %>%
gather(group, value, -class) %>% # reshape data
nest(-group) %>% # for each group nest data
mutate(pval = map_dbl(data, ~wilcox.test(value ~ class, data = .)$p.value)) %>% # get p value for wilcoxon test
select(-data) # remove data column
# # A tibble: 2 x 2
# group pval
# <chr> <dbl>
# 1 charge.Group3 0.109
# 2 hydrophobicity.Group3 0.0390
Run Code Online (Sandbox Code Playgroud)
首先重塑将使您能够应用此过程,无论您有多少列,假设这class是唯一的额外变量。
或者你甚至可以map像@Moody_Mudskipper 建议的那样避免使用
dat %>%
gather(group, value, -class) %>%
group_by(group) %>%
summarize(results = wilcox.test(value ~ class)$p.value)
Run Code Online (Sandbox Code Playgroud)
如果你真的想参与,broom那么你可以做
library(broom)
dat %>%
gather(group, value, -class) %>%
nest(-group) %>%
mutate(results = map(data, ~tidy(wilcox.test(value ~ class, data = .)))) %>%
select(-data) %>%
unnest(results)
# # A tibble: 2 x 5
# group statistic p.value method alternative
# <chr> <dbl> <dbl> <chr> <chr>
# 1 charge.Group3 170. 0.109 Wilcoxon rank sum test with continuity correction two.sided
# 2 hydrophobicity.Group3 183 0.0390 Wilcoxon rank sum test with continuity correction two.sided
Run Code Online (Sandbox Code Playgroud)
它返回更多列,但如果需要,您可以保留 p 值。