use*_*490 3 r dplyr data.table
我有一个数据文件,其中包含三列中的数值和两个分组变量(ID 和 Group),我需要从中通过 ID 和 Group 计算单个最大值:
structure(list(ID = structure(c(1L, 1L, 1L, 2L), .Label = c("a1",
"a2"), class = "factor"), Group = structure(c(1L, 1L, 2L, 2L), .Label =
c("abc",
"def"), class = "factor"), Score1 = c(10L, 0L, 0L, 5L), Score2 = c(0L,
0L, 5L, 10L), Score3 = c(0L, 11L, 2L, 11L)), class = "data.frame", row.names =
c(NA,
-4L))
Run Code Online (Sandbox Code Playgroud)
我试图获得的结果是:
structure(list(ID = structure(c(1L, 1L, 2L), .Label = c("a1",
"a2"), class = "factor"), Group = structure(c(1L, 2L, 2L), .Label = c("abc",
"def"), class = "factor"), Max = c(11L, 5L, 11L)), class = "data.frame",
row.names = c(NA,
-3L))
Run Code Online (Sandbox Code Playgroud)
我正在 dplyr 中尝试以下操作:
SampTable<-SampDF %>% group_by(ID,Group) %>%
summarize(max = pmax(SampDF$Score1, SampDF$Score2,SampDF$Score3))
Run Code Online (Sandbox Code Playgroud)
但它会产生这个错误:
Error in summarise_impl(.data, dots) :
Column `max` must be length 1 (a summary value), not 4
Run Code Online (Sandbox Code Playgroud)
有没有一种简单的方法可以在dplyr或 中实现这一目标data.table?
使用data.table. 3:5通过ID和查找列(分数列)上的最大值Group。
library(data.table)
setDT(d)
d[, .(Max = do.call(max, .SD)), .SDcols = 3:5, .(ID, Group)]
ID Group Max
1: a1 abc 11
2: a1 def 5
3: a2 def 11
Run Code Online (Sandbox Code Playgroud)
数据:
d <- structure(list(ID = structure(c(1L, 1L, 1L, 2L), .Label = c("a1",
"a2"), class = "factor"), Group = structure(c(1L, 1L, 2L, 2L), .Label =
c("abc",
"def"), class = "factor"), Score1 = c(10L, 0L, 0L, 5L), Score2 = c(0L,
0L, 5L, 10L), Score3 = c(0L, 11L, 2L, 11L)), class = "data.frame", row.names =
c(NA,
-4L))
Run Code Online (Sandbox Code Playgroud)