同时在 dplyr 中对多列进行 Shapiro.test

Sha*_*des 5 r dplyr

我正在尝试对数据集运行正态性检验 (shapiro-wilk),并且我希望同时获得所有列的统计数据和 p 值。我已经阅读了SO上的所有其他页面(R:按组进行的夏皮罗测试不会产生p值和损坏的数据框警告在数据框中的多个列上使用shapiro.test),但仍然无法弄清楚。任何帮助,将不胜感激!!

例如,这是数据集:具有一个字符向量(NVL)和其余数字,我想按 NVL(NV/VL)进行分组。

     NVL  Var1  Var2  Var3  Var 4  Var 5
1.   NV   22.5  26.8   89.2  35.7   100
2.   NV   34.7  67.4   29.8  12.4   100
3.   NV   68.3  34.5   44.5  23.8   100
4.   NV   11.2  55.3   17.5  77.9   100
5.   VL   55.6  77.2   59.7  89.6   100
6.   VL   60.5  88.7   65.4  99.6   100
7.   VL   89.4  87.5   65.9  89.5   100
8.   VL   65.4  74.2   75.4  89.5   100
9.   VL   81.8  78.5   95.4  92.5   100
Run Code Online (Sandbox Code Playgroud)

这是代码:

library(dplyr)
normalityVar1<-mydata %>%
group_by(NVL) %>%
summarise(statistic = shapiro.test(Var1)$statistic, 
p.value = shapiro.test(Var1)$p.value)
Run Code Online (Sandbox Code Playgroud)

这是输出:

NVL statistic   p.value
  <chr>     <dbl>     <dbl>
1    VL 0.9125239 0.1985486
2    NV 0.8983501 0.2101248
Run Code Online (Sandbox Code Playgroud)

现在,我是否编辑此代码,以便可以同时获得所有变量(Var2, 3, 4 ,5)的输出?我什至尝试聚合和应用,但我被困住了。

aggregate(formula = Var1 ~ NVL,
data = mydata,
FUN = function(x) {y <- shapiro.test(x); c(y$statistic, y$p.value)}) 
Run Code Online (Sandbox Code Playgroud)

正如您所看到的,我只能对一个变量执行此操作!我知道我已经很接近了,但我就是无法再弄清楚了!预先感谢您的帮助!

Mar*_*dri 2

mydata <- read.table(text="
   NVL  Var1  Var2  Var3  Var4  Var5
1   NV   22.5  26.8   89.2  35.7   100
2   NV   34.7  67.4   29.8  12.4   100
3   NV   68.3  34.5   44.5  23.8   50
4   NV   11.2  55.3   17.5  77.9   100
5   VL   55.6  77.2   59.7  89.6   100
6   VL   60.5  88.7   65.4  99.6   100
7   VL   89.4  87.5   65.9  89.5   100
8   VL   65.4  74.2   75.4  89.5   90
9   VL   81.8  78.5   95.4  92.5   90
", header=T)

library(dplyr)
myfun <- function(x, group) {
  data.frame(x, group) %>%
  group_by(group) %>%
  summarise(
    statistic = ifelse(sd(x)!=0,shapiro.test(x)$statistic,NA), 
    p.value = ifelse(sd(x)!=0,shapiro.test(x)$p.value,NA)
  )
}
(lst <- lapply(mydata[,-1], myfun, group=mydata[,1]))
Run Code Online (Sandbox Code Playgroud)

输出是:

$Var1
# A tibble: 2 x 3
   group statistic   p.value
  <fctr>     <dbl>     <dbl>
1     NV 0.9313476 0.6023421
2     VL 0.9149572 0.4979450

$Var2
# A tibble: 2 x 3
   group statistic   p.value
  <fctr>     <dbl>     <dbl>
1     NV 0.9409576 0.6601747
2     VL 0.8736587 0.2815562

$Var3
# A tibble: 2 x 3
   group statistic   p.value
  <fctr>     <dbl>     <dbl>
1     NV 0.9096322 0.4804557
2     VL 0.8644349 0.2446131

$Var4
# A tibble: 2 x 3
   group statistic    p.value
  <fctr>     <dbl>      <dbl>
1     NV 0.9003135 0.43261822
2     VL 0.7260939 0.01760713

$Var5
# A tibble: 2 x 3
   group statistic     p.value
  <fctr>     <dbl>       <dbl>
1     NV 0.6297763 0.001240726
2     VL 0.6840289 0.006470001
Run Code Online (Sandbox Code Playgroud)

输出lst列表可以转换为一个data.frame对象:

do.call(cbind, lst)[,-seq(4,3*(ncol(mydata)-1),3)]
Run Code Online (Sandbox Code Playgroud)

这是输出:

  Var1.group Var1.statistic Var1.p.value Var2.statistic Var2.p.value Var3.statistic Var3.p.value Var4.statistic Var4.p.value Var5.statistic Var5.p.value
1         NV      0.9313476    0.6023421      0.9409576    0.6601747      0.9096322    0.4804557      0.9003135   0.43261822      0.6297763  0.001240726
2         VL      0.9149572    0.4979450      0.8736587    0.2815562      0.8644349    0.2446131      0.7260939   0.01760713      0.6840289  0.006470001
Run Code Online (Sandbox Code Playgroud)