使用 Summarize 和 dplyr 来按组对多个列的非“NA”进行计数

Bri*_*iro 7 r dataframe dplyr

我想使用summarizeand acrossfrom来计算分组变量的dplyr非值数量。NA例如,使用这些数据:

library(tidyverse)  
d <- tibble(ID = c(1, 1, 1, 2, 2, 2, 3, 3, 3),
            Col1 = c(5, 8, 2, NA, 2, 2, NA, NA, 1),
            Col2 = c(NA, 2, 1, NA, NA, NA, 1, NA, NA),
            Col3 = c(1, 5, 2, 4, 1, NA, NA, NA, NA))  
Run Code Online (Sandbox Code Playgroud)
# A tibble: 9 x 4
     ID  Col1  Col2  Col3
  <dbl> <dbl> <dbl> <dbl>
1     1     5    NA     1
2     1     8     2     5
3     1     2     1     2
4     2    NA    NA     4
5     2     2    NA     1
6     2     2    NA    NA
7     3    NA     1    NA
8     3    NA    NA    NA
9     3     1    NA    NA
Run Code Online (Sandbox Code Playgroud)

解决方案类似于:

d %>%
  group_by(ID) %>%
  summarize(across(matches("^Col[1-3]$"),
                   #function to count non-NA per column per ID
                   ))
Run Code Online (Sandbox Code Playgroud)

结果如下:

# A tibble: 3 x 4
     ID  Col1  Col2  Col3
  <dbl> <dbl> <dbl> <dbl>
1     1     3     2     3
2     2     2     0     2
3     3     1     1     0
Run Code Online (Sandbox Code Playgroud)

Ano*_*n R 9

我希望这就是您正在寻找的:

library(dplyr)

d %>%
  group_by(ID) %>%
  summarise(across(Col1:Col3, ~ sum(!is.na(.x)), .names = "non-{.col}"))

# A tibble: 3 x 4
     ID `non-Col1` `non-Col2` `non-Col3`
  <dbl>      <int>      <int>      <int>
1     1          3          2          3
2     2          2          0          2
3     3          1          1          0

Run Code Online (Sandbox Code Playgroud)

或者,如果您想通过共享字符串选择列,您可以使用以下命令:

d %>%
  group_by(ID) %>%
  summarise(across(contains("Col"), ~ sum(!is.na(.x)), .names = "non-{.col}"))
Run Code Online (Sandbox Code Playgroud)