数据帧中的条件计数

Anu*_*hit 1 r dataframe dplyr

我有一个带有三列的数据框(df),如下所示:

结构体:

id id1 age
A1 a1  32
A1 a2  45
A1 a3  45
A1 a4  12
A2 b1  15
A2 b5  34
A2 b64 17
Run Code Online (Sandbox Code Playgroud)

预期产量:

id count count1
A1 4     1
A2 3     2
Run Code Online (Sandbox Code Playgroud)

逻辑:

  • 列“ count”是重复“ id”的次数
  • 列“ count1”是年龄小于21的行数

当前代码:

library(dplyr)
df_summarized <- df %>% 
                     group_by(id) >%> 
                     summarise(count = n(),count1 = count(age<21)) 
Run Code Online (Sandbox Code Playgroud)

问题:

Error: no applicable method for 'group_by_' applied to an object of class "logical"
Run Code Online (Sandbox Code Playgroud)

akr*_*run 5

我们需要做 sum

df %>% 
    group_by(id) %>% 
    summarise(count = n(),count1 = sum(age < 21))
# A tibble: 2 × 3
#     id count count1
#  <chr> <int>  <int>
#1    A1     4      1
#2    A2     3      2
Run Code Online (Sandbox Code Playgroud)

作为count适用于data.frametbl_df,而不是在内部的单个列summarise


或使用 data.table

library(data.table)
setDT(df)[, .(count = .N, count1 = sum(age < 21)), id]
Run Code Online (Sandbox Code Playgroud)

或搭配 base R

cbind(count = rowSums(table(df[-2])), count1 = as.vector(rowsum(+(df$age < 21), df$id)))
#   count count1
#A1     4      1
#A2     3      2
Run Code Online (Sandbox Code Playgroud)

或使用aggregate基于sum

do.call(data.frame, aggregate(age~id, df, FUN =
            function(x) c(count = length(x), count1 = sum(x<21))))
Run Code Online (Sandbox Code Playgroud)

注意:所有上述方法为数据集提供适当的列。这将在中特别说明aggregate。这就是将输出列(即矩阵)转换为适当的列的原因do.call(data.frame