在data.table中按组绘图

Mic*_*ico 5 r data.table

我有个人级数据,我试图按组动态汇总结果.

例:

set.seed(12039)
DT <- data.table(id = rep(1:100, each = 50),
                 grp = rep(letters[1:4], each = 1250),
                 time = rep(1:50, 100),
                 outcome = rnorm(5000))
Run Code Online (Sandbox Code Playgroud)

我想知道绘制组级摘要的最简单方法,其数据包含在:

DT[ , mean(outcome), by = .(grp, time)]
Run Code Online (Sandbox Code Playgroud)

我想要的东西:

dt[ , plot(mean(outcome)), by = .(grp, time)]
Run Code Online (Sandbox Code Playgroud)

但这根本不起作用.

我幸存的可行选项(可以很容易地循环)是:

plot(DT[grp == "a", mean(outcome), by = time])
lines(DT[grp == "b", mean(outcome), by = time])
lines(DT[grp == "c", mean(outcome), by = time])
lines(DT[grp == "d", mean(outcome), by = time])
Run Code Online (Sandbox Code Playgroud)

(添加了颜色等参数,为简洁而排除)

这让我觉得不是最好的方法 - 考虑data.table到处理小组的工艺,难道没有更优雅的解决方案吗?

其他消息来源一直指向我,matplot但我看不到一种直接的方式来使用它 - 我是否需要重塑DT,是否有一个简单的reshape方法可以完成工作?

Ren*_*rop 5

使用和的基本R解决方案matplotdcast

dt_agg <- dt[ , .(mean = mean(outcome)), by=.(grp,time)]
dt_cast <- dcast(dt_agg, time~grp, value.var="mean")
dt_cast[ , matplot(time, .SD[ , !"time"], type="l", ylab="mean", xlab="")]
# alternative:
dt_cast[ , matplot(time, .SD, type="l", ylab="mean", xlab=""), .SDcols = !"time"]
Run Code Online (Sandbox Code Playgroud)

结果: 在此处输入图片说明


Mic*_*ico 5

有一种方法可以使用data.table'sby参数执行此操作,如下所示:

DT[ , mean(outcome), by = .(grp, time)
    ][ , {plot(NULL, xlim = range(time),
           ylim = range(V1)); .SD}
       ][ , lines(time, V1, col = .GRP), by = grp]
Run Code Online (Sandbox Code Playgroud)

请注意,中间{...; .SD}部分是继续链接所必需的。如果DT[ , mean(outcome), by = .(grp, time)]已经存储为另一个data.table, DT_m,那么我们可以这样做:

DT_m[ , plot(NULL, xlim = range(time), ylim = range(V1))]
DT_m[ , lines(time, V1, col = .GRP), by = grp]
Run Code Online (Sandbox Code Playgroud)

带输出

数据表分组依据

更漂亮的结果是可能的;例如,如果我们想为每个组指定特定的颜色:

grp_col <- c(a = "blue", b = "black",
             c = "darkgreen", d = "red")
DT[ , mean(outcome), by = .(grp, time)
    ][ , {plot(NULL, xlim = range(time),
           ylim = range(V1)); .SD}
       ][ , lines(time, V1, col = grp_col[.BY$grp]), by = grp]
Run Code Online (Sandbox Code Playgroud)

笔记

RStudio 中存在一个错误,如果将输出发送到 RStudio 图形设备,它将导致此代码失败。因此,这种方法仅适用于命令行上的 R 或将输出发送到外部设备(我将其发送png到生成上述内容)。

请参阅data.table问题 #1524此 RStudio 支持票以及这些 SO Q(12