假设您有下表:
Student<-c("Bob", "Joe", "Sam", "John")
ClassDate<-as.Date(c("2020-01-01", "2020-01-01", "2020-01-02", "2020-01-05"), "%Y-%m-%d")
df<-data.frame(Student=Student, ClassDate=ClassDate)
df
Student ClassDate
1 Bob 2020-01-01
2 Joe 2020-01-01
3 Sam 2020-01-02
4 John 2020-01-05
Run Code Online (Sandbox Code Playgroud)
当您为 ClassDate 制作累积频率表时,您会得到以下内容:
data.frame(cumsum(table(df$ClassDate)))
cumsum.table.df.ClassDate..
2020-01-01 2
2020-01-02 3
2020-01-05 4
Run Code Online (Sandbox Code Playgroud)
但是,我正在寻找以下内容,其中仍包含缺少的日期
cumsum.table.df.ClassDate..
2020-01-01 2
2020-01-02 3
2020-01-03 3
2020-01-04 3
2020-01-05 4
Run Code Online (Sandbox Code Playgroud)
一个选项是创建一列 1,complete通过创建seq从'ClassDate' 'day'的minimum 到imum 值的影响来扩展数据,同时将 'n' 替换为 0,然后对 'n' 列进行分组,并执行maxbyfillsumcumsum
library(dplyr)
library(tidyr)
df %>%
mutate(n = 1) %>%
complete(ClassDate = seq(min(ClassDate), max(ClassDate),
by = '1 day'), fill = list(n = 0)) %>%
group_by(ClassDate) %>%
summarise(n = sum(n), .groups = 'drop') %>%
mutate(n = cumsum(n))
Run Code Online (Sandbox Code Playgroud)
-输出
# A tibble: 5 x 2
# ClassDate n
#* <date> <dbl>
#1 2020-01-01 2
#2 2020-01-02 3
#3 2020-01-03 3
#4 2020-01-04 3
#5 2020-01-05 4
Run Code Online (Sandbox Code Playgroud)
在 中,还有一个选项是在转换为时base R指定levelsfactor
v1 <- with(df, factor(ClassDate, levels =
as.character(seq(min(ClassDate), max(ClassDate), by = '1 day'))))
data.frame(Cumsum = cumsum(table(v1)))
Run Code Online (Sandbox Code Playgroud)