当使用summarise具有plyr的ddply功能,空类别默认情况下删除.您可以通过添加更改此行为.drop = FALSE.然而,当使用这不起作用summarise用dplyr.还有另一种方法可以在结果中保留空类别吗?
这是假数据的一个例子.
library(dplyr)
df = data.frame(a=rep(1:3,4), b=rep(1:2,6))
# Now add an extra level to df$b that has no corresponding value in df$a
df$b = factor(df$b, levels=1:3)
# Summarise with plyr, keeping categories with a count of zero
plyr::ddply(df, "b", summarise, count_a=length(a), .drop=FALSE)
b count_a
1 1 6
2 2 6
3 3 0
# Now try it with dplyr
df %.%
group_by(b) %.%
summarise(count_a=length(a), .drop=FALSE)
b …Run Code Online (Sandbox Code Playgroud) tidyr::complete()将行添加到a data.frame中,以获取数据中缺少的列值组合.例:
library(dplyr)
library(tidyr)
df <- data.frame(person = c(1,2,2),
observation_id = c(1,1,2),
value = c(1,1,1))
df %>%
tidyr::complete(person,
observation_id,
fill = list(value=0))
Run Code Online (Sandbox Code Playgroud)
产量
# A tibble: 4 × 3
person observation_id value
<dbl> <dbl> <dbl>
1 1 1 1
2 1 2 0
3 2 1 1
4 2 2 1
Run Code Online (Sandbox Code Playgroud)
其中value组合person == 1和observation_id == 2缺少的组合df已填入值0.
什么相当于这个data.table?
关闭但不重复:在 tidyr/dplyr 中添加零计数行的正确习惯用法- 我试图根据 df 中的现有值进行填充,但也根据没有id. 相似,但本质不同。
对于每个id,我试图确保每个都有 3 个计费月。
理想情况下,对于每个id我都需要所有三个required months都出现在df_complete. 如果它不在数据中,我希望为值添加一行“未找到”。
此外,我想检查all_ids并添加在其中all_ids但没有行的IDdf
months <- as.data.frame(as.Date(c("2016/7/1","2016/9/1","2016/7/1", "2016/8/1","2016/9/1", "2016/8/1","2016/9/1")))
id <- as.data.frame(c("a","a","b","b","b","c","c"))
value <- as.data.frame(c(1,2,3,4,5,6,7))
df <- cbind(id,months,value)
colnames(df) <- c("id","billing months","value")
required_months <- as.data.frame(as.Date(c("2016/7/1", "2016/8/1","2016/9/1")))
colnames(required_months)<- "required months"
all_ids <- as.data.frame(c("a","b", "c", "d"))
Run Code Online (Sandbox Code Playgroud)
df 最终看起来像:
id billing months value
a 7/1/2016 1
a 9/1/2016 2
b 7/1/2016 3
b 8/1/2016 4
b …Run Code Online (Sandbox Code Playgroud)