如何根据多个条件对行进行求和 - R?

Bor*_*lis 2 r sum summary multiple-conditions dataframe

我有一个包含绘图ID(plotID),树种代码(种类)和覆盖值(覆盖)的数据框.您可以看到其中一个图中有多个树种记录.如果每个图中有重复的"种类"行,如何对"覆盖"字段求和?

例如,以下是一些示例数据:

# Sample Data
plotID = c( "SUF200001035014", "SUF200001035014", "SUF200001035014", "SUF200001035014", "SUF200001035014", "SUF200046012040",
       "SUF200046012040", "SUF200046012040", "SUF200046012040", "SUF200046012040", "SUF200046012040", "SUF200046012040")
species = c("ABBA",  "BEPA",  "PIBA2", "PIMA",  "PIRE",  "PIBA2", "PIBA2", "PIMA",  "PIMA",  "PIRE",  "POTR5", "POTR5")
cover = c(26.893939,  5.681818,  9.469697, 16.287879,  1.893939, 16.287879,  4.166667, 10.984848, 16.666667, 11.363636, 18.181818,
          13.257576)
df_original = data.frame(plotID, species, cover)
Run Code Online (Sandbox Code Playgroud)

在此输入图像描述

这是预期的输出:

# Intended Output
plotID2 = c( "SUF200001035014", "SUF200001035014", "SUF200001035014", "SUF200001035014", "SUF200001035014", "SUF200046012040",
            "SUF200046012040", "SUF200046012040", "SUF200046012040")
species2 = c("ABBA",  "BEPA",  "PIBA2", "PIMA",  "PIRE",  "PIBA2", "PIMA",  "PIRE",  "POTR5")
cover2 = c(26.893939,  5.681818,  9.469697, 16.287879,  1.893939, 20.454546, 18.651515, 11.363636, 31.439394)
df_intended_output = data.frame(plotID2, species2, cover2)
Run Code Online (Sandbox Code Playgroud)

在此输入图像描述

Exp*_*teR 9

容易 aggregate

aggregate(cover~species+plotID, data=df_original, FUN=sum) 
Run Code Online (Sandbox Code Playgroud)

更容易 data.table

as.data.table(df_original)[, sum(cover), by = .(plotID, species)]
Run Code Online (Sandbox Code Playgroud)


jal*_*pic 5

您可以通过多种方式执行此操作.使用基地-R,dplyr并且data.table将是最典型的.

这是dplyr方式:

library(dplyr)

df_original %>% group_by(plotID, species) %>% summarize(cover = sum(cover))

#          plotID species     cover
#1 SUF200001035014    ABBA 26.893939
#2 SUF200001035014    BEPA  5.681818
#3 SUF200001035014   PIBA2  9.469697
#4 SUF200001035014    PIMA 16.287879
#5 SUF200001035014    PIRE  1.893939
#6 SUF200046012040   PIBA2 20.454546
#7 SUF200046012040    PIMA 27.651515
#8 SUF200046012040    PIRE 11.363636
#9 SUF200046012040   POTR5 31.439394
Run Code Online (Sandbox Code Playgroud)

这将是基本的方式:

aggregate(df_original$cover, by=list(df_original$plotID, df_original$species), FUN=sum)
Run Code Online (Sandbox Code Playgroud)

和data.table方式 -

    library(data.table)
    DT <- as.data.table(df_original)
    DT[, lapply(.SD,sum), by = "plotID,species"]
Run Code Online (Sandbox Code Playgroud)