如何使用 dplyr 包计算 R 中每个组内每个变量之间的相关性?

Hom*_*son 5 r correlation dplyr

假设我在 R 中有一个数据框,如下所示:

\n
var2 = c(rep("A",3),rep("B",3),rep("C",3),rep("D",3),rep("E",3),rep("F",3),\n         rep("H",3),rep("I",3))\n\ny2 = c(-1.23, -0.983, 1.28, -0.268, -0.46, -1.23,\n            1.87, 0.416, -1.99, 0.289, 1.7, -0.455,\n           -0.648, 0.376, -0.887,0.534,-0.679,-0.923,\n           0.987,0.324,-0.783,-0.679,0.326,0.998);length(y2)\ngroup2 = c(rep(1,6),rep(2,6),rep(3,6),rep(1,6))\ndata2 = tibble(var2,group2,y2)\n\n
Run Code Online (Sandbox Code Playgroud)\n

输出:

\n
# A tibble: 24 \xc3\x97 3\n   var2  group2     y2\n   <chr>  <dbl>  <dbl>\n 1 A          1 -1.23 \n 2 A          1 -0.983\n 3 A          1  1.28 \n 4 B          1 -0.268\n 5 B          1 -0.46 \n 6 B          1 -1.23 \n 7 C          2  1.87 \n 8 C          2  0.416\n 9 C          2 -1.99 \n10 D          2  0.289\n11 D          2  1.7  \n12 D          2 -0.455\n13 E          3 -0.648\n14 E          3  0.376\n15 E          3 -0.887\n16 F          3  0.534\n17 F          3 -0.679\n18 F          3 -0.923\n19 H          1  0.987\n20 H          1  0.324\n21 H          1 -0.783\n22 I          1 -0.679\n23 I          1  0.326\n24 I          1  0.998\n
Run Code Online (Sandbox Code Playgroud)\n

我想使用 dplyr 计算每个组中 R 中每个不同对的相关性。\n理想情况下,我希望结果小标题看起来像这样(第四列包含每个相关对的值):

\n

理想情况下必须如下所示:

\n
\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n\n\n\n\n\n\n\n\n\n
团体变量1变量2价值
1A科尔(A,B)
1AH科尔(A,H)
1A科尔(A,I)
1H科尔(B,H)
1科尔(B,I)
1H科尔(H,I)
2CD科尔(C,D)
3F科尔(E,F)
\n
\n

我如何在 R 中做到这一点?\n有什么帮助吗?

\n

Pau*_*ulS 3

一个可能的解决方案:

library(tidyverse)

data2 %>%
  group_by(group2) %>% 
  group_split() %>% 
  map(\(x) x %>% group_by(var2) %>% 
  group_map(~ data.frame(.x[-1]) %>% set_names(.y)) %>% 
  bind_cols() %>% cor %>% 
  {data.frame(row = rownames(.)[row(.)[upper.tri(.)]], 
              col = colnames(.)[col(.)[upper.tri(.)]], 
              corr = .[upper.tri(.)])}) %>% 
  imap_dfr(~ data.frame(group = .y, .x))

#>   group row col       corr
#> 1     1   A   B -0.9949738
#> 2     1   A   H -0.9581357
#> 3     1   B   H  0.9819901
#> 4     1   A   I  0.8533855
#> 5     1   B   I -0.9012948
#> 6     1   H   I -0.9669093
#> 7     2   C   D  0.4690460
#> 8     3   E   F -0.1864518
Run Code Online (Sandbox Code Playgroud)