我正在寻找一种方法来分配(或重新思考我如何处理任务)以下情况留在dplyr而不是"诉诸"data.table,因为我在dplyr中完成此块之前/之后的分析.
情况:给定一个具有多个复制的模拟数据集,我想基于两列密钥(ID和REP)来子集/ dplyr :: filter.
libs <- c("dplyr", "data.table")
lapply(libs, require, character.only = T)
# minimally reproducible example
# dataset
dat <- expand.grid(ID = 1:3, REP = 1:5, TIME = 1:3)
dat <- dat[order(dat$REP, dat$ID, dat$TIME),]
dat$CONC <- runif(nrow(dat), 1, 10)
# key/index
set.seed(1235)
ID_sample <- sample(unique(dat$ID), size = 5, replace = TRUE)
REP_sample <- sample(unique(dat$REP), size = 5, replace = TRUE)
key <- data.frame(ID = ID_sample, REP = REP_sample)
# data table solution
dt <- data.table(dat)
setkey(dt, ID, REP) …
Run Code Online (Sandbox Code Playgroud)