多列代表一个值时的频率表 (R)

ale*_*lex 5 r

我有一个这样的数据集:

ID    color1   color2  color3   shape1       shape2        size
55    red     blue     NA       circle       triangle      small
67    yellow  NA       NA       triangle     NA            medium
83    blue    yellow   NA       circle       NA            large
78    red     yellow   blue     square       circle        large
43    green   NA       NA       square       circle        small
29    yellow  green    NA       circle       triangle      medium

Run Code Online (Sandbox Code Playgroud)

我想创建一个数据框,其中包含每个变量的频率和百分比,但我遇到了麻烦,因为在某些情况下同一变量有多个列。


Variable      Level        Freq        Percent 
 
color         blue          3           27.27
              red           2           18.18
              yellow        4           36.36
              green         2           18.18
              total         11          100.00

shape         circle        5           50.0       
              triangle      3           30.0
              square        2           20.0
              total         10          100.0

size          small         2           33.3
              medium        2           33.3
              large         2           33.3
              total         6           100.0

Run Code Online (Sandbox Code Playgroud)

我相信我需要将这些变量转换为 long,然后使用 summarize/mutate 来获取频率,但我似乎无法弄清楚。任何帮助是极大的赞赏。

Jon*_*ano 4

您可以使用tidyverse包将数据转换为长格式,然后汇总所需的统计信息。

library(tidyverse)

df |> 
  # Transform all columns into a long format
  pivot_longer(cols = -ID,
               names_pattern = "([A-z]+)",
               names_to = c("variable")) |>
  # Drop NA entries
  drop_na(value) |>
  # Group by variable
  group_by(variable) |>
  # Count
  count(value) |>
  # Calculate percentage as n / sum of n by variable
  mutate(perc = 100* n / sum(n))

# A tibble: 10 x 4
# Groups:   variable [3]
#   variable value        n  perc
#   <chr>    <chr>    <int> <dbl>
# 1 color    blue         3  27.3
# 2 color    green        2  18.2
# 3 color    red          2  18.2
# 4 color    yellow       4  36.4
# 5 shape    circle       5  50  
# 6 shape    square       2  20  
# 7 shape    triangle     3  30  
# 8 size     large        2  33.3
# 9 size     medium       2  33.3
#10 size     small        2  33.3
Run Code Online (Sandbox Code Playgroud)