假设我收集 Stack Overflow 中的帖子,并将它们分类为 N 个类别。我的目标是每天绘制 N 个百分比和一条包含每天帖子总数的线。
为了玩,我将使用一个玩具数据框。我可以绘制每天每个类别的百分比:
data(beav1)
beav1$day <- as.factor(beav1$day)
beav1[beav1$day==346,]$time <- 1:sum(beav1$day==346)
beav1[beav1$day==347,]$time <- 1:sum(beav1$day==347)
beav1 <- filter(beav1, time<23)
ggplot(beav1, aes(x=time, y=temp, group=day, fill=day, color=day)) +
geom_line()
Run Code Online (Sandbox Code Playgroud)
但是我怎样才能加上总温度的线呢?还是均值?
编辑:与另一个问题的不同之处在于,我希望所有组都使用一行,而不是每组一行。
数据集
dput(beav1)
structure(list(day = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L), .Label = c("346", "347"), class = "factor"),
time = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L,
13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 1L, 2L,
3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L,
16L, 17L, 18L, 19L, 20L, 21L, 22L), temp = c(36.33, 36.34,
36.35, 36.42, 36.55, 36.69, 36.71, 36.75, 36.81, 36.88, 36.89,
36.91, 36.85, 36.89, 36.89, 36.67, 36.5, 36.74, 36.77, 36.76,
36.78, 36.82, 36.93, 36.83, 36.8, 36.75, 36.71, 36.73, 36.75,
36.72, 36.76, 36.7, 36.82, 36.88, 36.94, 36.79, 36.78, 36.8,
36.82, 36.84, 36.86, 36.88, 36.93, 36.97), activ = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)), class = "data.frame", row.names = c(NA,
-44L), .Names = c("day", "time", "temp", "activ"))
Run Code Online (Sandbox Code Playgroud)
好的,请注意group,fill和color不在ggplot但在 中geom_line,这样您就可以使用stat_summary而无需重新定义组。
ggplot(beav1, aes(x=time, y=temp)) +
geom_line(aes(group=day, fill=day, color=day))+
stat_summary(fun.y = mean, na.rm = TRUE, group = 3, color = 'black', geom ='line')
Run Code Online (Sandbox Code Playgroud)
如果你想要总和只是地方 fun.y = sum