我有一个带有三列的数据框(df),如下所示:
结构体:
id id1 age
A1 a1 32
A1 a2 45
A1 a3 45
A1 a4 12
A2 b1 15
A2 b5 34
A2 b64 17
Run Code Online (Sandbox Code Playgroud)
预期产量:
id count count1
A1 4 1
A2 3 2
Run Code Online (Sandbox Code Playgroud)
逻辑:
当前代码:
library(dplyr)
df_summarized <- df %>%
group_by(id) >%>
summarise(count = n(),count1 = count(age<21))
Run Code Online (Sandbox Code Playgroud)
问题:
Error: no applicable method for 'group_by_' applied to an object of class "logical"
Run Code Online (Sandbox Code Playgroud)
我们需要做 sum
df %>%
group_by(id) %>%
summarise(count = n(),count1 = sum(age < 21))
# A tibble: 2 × 3
# id count count1
# <chr> <int> <int>
#1 A1 4 1
#2 A2 3 2
Run Code Online (Sandbox Code Playgroud)
作为count适用于data.frame或tbl_df,而不是在内部的单个列summarise
或使用 data.table
library(data.table)
setDT(df)[, .(count = .N, count1 = sum(age < 21)), id]
Run Code Online (Sandbox Code Playgroud)
或搭配 base R
cbind(count = rowSums(table(df[-2])), count1 = as.vector(rowsum(+(df$age < 21), df$id)))
# count count1
#A1 4 1
#A2 3 2
Run Code Online (Sandbox Code Playgroud)
或使用aggregate基于sum
do.call(data.frame, aggregate(age~id, df, FUN =
function(x) c(count = length(x), count1 = sum(x<21))))
Run Code Online (Sandbox Code Playgroud)
注意:所有上述方法为数据集提供适当的列。这将在中特别说明aggregate。这就是将输出列(即矩阵)转换为适当的列的原因do.call(data.frame