Jac*_*son 3 loops r function dataframe dplyr
我有一个包含多个列的数据框(标题为 df1),其中包含 Yes、No 或 NA 答案。
A B C D
1 Yes No No Yes
2 Yes No No No
3 <NA> Yes Yes <NA>
4 No <NA> No Yes
Run Code Online (Sandbox Code Playgroud)
我的目标是创建一个表来计算每个答案的频率并输出一个带有原始列名的表,如下所示:
Answer A B C D
1 Yes 2 1 1 2
2 No 1 2 3 1
3 <NA> 1 1 0 1
Run Code Online (Sandbox Code Playgroud)
到目前为止,我的方法是构建一个函数,然后在该函数上循环,但输出不会生成包含所有类别(A 到 D)的表。
my_function <- function(table_name,col_name) {
table_name %>%
group_by_(Answer = col_name) %>%
summarise(!!paste0(col_name):= n())}
my_categories <- c("A","B","C","D")
for(i in 1:length(my_categories)){
df2 <- myfunction(df1,CSAT_Cols[i])
}
Run Code Online (Sandbox Code Playgroud)
如果有更简单的方法,我也愿意接受不同的方法,但是 TL:DR,尝试按多个类别循环分组,按 n() 进行汇总,然后创建包含所有数据的单个表。
我们可以将“长”格式重塑为“长”格式,pivot_longer
然后返回“宽”格式,pivot_wider
同时在不同的列上指定values_fn
aslength
library(dplyr)
library(tidyr)
df1 %>%
pivot_longer(cols = everything(), values_to = 'Answer') %>%
pivot_wider(names_from = name, values_from = name,
values_fn = length, values_fill = 0)
Run Code Online (Sandbox Code Playgroud)
-输出
# A tibble: 3 x 5
Answer A B C D
<chr> <int> <int> <int> <int>
1 Yes 2 1 1 2
2 No 1 2 3 1
3 <NA> 1 1 0 1
Run Code Online (Sandbox Code Playgroud)
df1 <- structure(list(A = c("Yes", "Yes", NA, "No"), B = c("No", "No",
"Yes", NA), C = c("No", "No", "Yes", "No"), D = c("Yes", "No",
NA, "Yes")), class = "data.frame", row.names = c("1", "2", "3",
"4"))
Run Code Online (Sandbox Code Playgroud)