我有个人级数据,我试图按组动态汇总结果.
例:
set.seed(12039)
DT <- data.table(id = rep(1:100, each = 50),
grp = rep(letters[1:4], each = 1250),
time = rep(1:50, 100),
outcome = rnorm(5000))
Run Code Online (Sandbox Code Playgroud)
我想知道绘制组级摘要的最简单方法,其数据包含在:
DT[ , mean(outcome), by = .(grp, time)]
Run Code Online (Sandbox Code Playgroud)
我想要的东西:
dt[ , plot(mean(outcome)), by = .(grp, time)]
Run Code Online (Sandbox Code Playgroud)
但这根本不起作用.
我幸存的可行选项(可以很容易地循环)是:
plot(DT[grp == "a", mean(outcome), by = time])
lines(DT[grp == "b", mean(outcome), by = time])
lines(DT[grp == "c", mean(outcome), by = time])
lines(DT[grp == "d", mean(outcome), by = time])
Run Code Online (Sandbox Code Playgroud)
(添加了颜色等参数,为简洁而排除)
这让我觉得不是最好的方法 - 考虑data.table到处理小组的工艺,难道没有更优雅的解决方案吗?
其他消息来源一直指向我,matplot但我看不到一种直接的方式来使用它 - 我是否需要重塑DT,是否有一个简单的reshape方法可以完成工作?
使用和的基本R解决方案matplotdcast
dt_agg <- dt[ , .(mean = mean(outcome)), by=.(grp,time)]
dt_cast <- dcast(dt_agg, time~grp, value.var="mean")
dt_cast[ , matplot(time, .SD[ , !"time"], type="l", ylab="mean", xlab="")]
# alternative:
dt_cast[ , matplot(time, .SD, type="l", ylab="mean", xlab=""), .SDcols = !"time"]
Run Code Online (Sandbox Code Playgroud)
结果:

有一种方法可以使用data.table'sby参数执行此操作,如下所示:
DT[ , mean(outcome), by = .(grp, time)
][ , {plot(NULL, xlim = range(time),
ylim = range(V1)); .SD}
][ , lines(time, V1, col = .GRP), by = grp]
Run Code Online (Sandbox Code Playgroud)
请注意,中间{...; .SD}部分是继续链接所必需的。如果DT[ , mean(outcome), by = .(grp, time)]已经存储为另一个data.table, DT_m,那么我们可以这样做:
DT_m[ , plot(NULL, xlim = range(time), ylim = range(V1))]
DT_m[ , lines(time, V1, col = .GRP), by = grp]
Run Code Online (Sandbox Code Playgroud)
带输出
更漂亮的结果是可能的;例如,如果我们想为每个组指定特定的颜色:
grp_col <- c(a = "blue", b = "black",
c = "darkgreen", d = "red")
DT[ , mean(outcome), by = .(grp, time)
][ , {plot(NULL, xlim = range(time),
ylim = range(V1)); .SD}
][ , lines(time, V1, col = grp_col[.BY$grp]), by = grp]
Run Code Online (Sandbox Code Playgroud)
RStudio 中存在一个错误,如果将输出发送到 RStudio 图形设备,它将导致此代码失败。因此,这种方法仅适用于命令行上的 R 或将输出发送到外部设备(我将其发送png到生成上述内容)。
请参阅data.table问题 #1524、此 RStudio 支持票以及这些 SO Q(1和2)