按多个组计算跨多列的最大值

use*_*490 3 r dplyr data.table

我有一个数据文件,其中包含三列中的数值和两个分组变量(ID 和 Group),我需要从中通过 ID 和 Group 计算单个最大值:

structure(list(ID = structure(c(1L, 1L, 1L, 2L), .Label = c("a1", 
"a2"), class = "factor"), Group = structure(c(1L, 1L, 2L, 2L), .Label = 
c("abc", 
"def"), class = "factor"), Score1 = c(10L, 0L, 0L, 5L), Score2 = c(0L, 
0L, 5L, 10L), Score3 = c(0L, 11L, 2L, 11L)), class = "data.frame", row.names = 
c(NA, 
-4L))
Run Code Online (Sandbox Code Playgroud)

我试图获得的结果是:

structure(list(ID = structure(c(1L, 1L, 2L), .Label = c("a1", 
"a2"), class = "factor"), Group = structure(c(1L, 2L, 2L), .Label = c("abc", 
"def"), class = "factor"), Max = c(11L, 5L, 11L)), class = "data.frame", 
row.names = c(NA, 
-3L))
Run Code Online (Sandbox Code Playgroud)

我正在 dplyr 中尝试以下操作:

SampTable<-SampDF %>% group_by(ID,Group) %>% 
summarize(max = pmax(SampDF$Score1, SampDF$Score2,SampDF$Score3))
Run Code Online (Sandbox Code Playgroud)

但它会产生这个错误:

Error in summarise_impl(.data, dots) : 
Column `max` must be length 1 (a summary value), not 4
Run Code Online (Sandbox Code Playgroud)

有没有一种简单的方法可以在dplyr或 中实现这一目标data.table

PoG*_*bas 5

使用data.table. 3:5通过ID和查找列(分数列)上的最大值Group

library(data.table)
setDT(d)
d[, .(Max = do.call(max, .SD)), .SDcols = 3:5, .(ID, Group)]

   ID Group Max
1: a1   abc  11
2: a1   def   5
3: a2   def  11
Run Code Online (Sandbox Code Playgroud)

数据:

d <- structure(list(ID = structure(c(1L, 1L, 1L, 2L), .Label = c("a1", 
"a2"), class = "factor"), Group = structure(c(1L, 1L, 2L, 2L), .Label = 
c("abc", 
"def"), class = "factor"), Score1 = c(10L, 0L, 0L, 5L), Score2 = c(0L, 
0L, 5L, 10L), Score3 = c(0L, 11L, 2L, 11L)), class = "data.frame", row.names = 
c(NA, 
-4L))
Run Code Online (Sandbox Code Playgroud)

  • 或者类似于@www 的回答,有 `melt(DT, meas=patterns("Score"))[, .(Max = max(value)), by=.(ID, Group)]` (3认同)