我有一个data.table,看起来像这样:
> dt <- data.table(
group1 = c("a", "a", "a", "b", "b", "b", "b"),
group2 = c("x", "x", "y", "y", "z", "z", "z"),
data1 = c(NA, rep(T, 3), rep(F, 2), "sometimes"),
data2 = c("sometimes", rep(F,3), rep(T,2), NA))
> dt
group1 group2 data1 data2
1: a x NA sometimes
2: a x TRUE FALSE
3: a y TRUE FALSE
4: b y TRUE FALSE
5: b z FALSE TRUE
6: b z FALSE TRUE
7: b z sometimes NA
Run Code Online (Sandbox Code Playgroud)
我的目标是找到每个数据列中的非NA记录数,按group1和分组group2.
group1 group2 data1 data2
1: a x 1 2
3: a y 1 1
4: b y 1 1
5: b z 3 2
Run Code Online (Sandbox Code Playgroud)
我有这个代码遗留下来处理数据集的另一部分,它没有NAs并且是合乎逻辑的:
dt[
,
lapply(.SD, sum),
by = list(group1, group2),
.SDcols = c("data3", "data4")
]
Run Code Online (Sandbox Code Playgroud)
但它不适用于NA值或非逻辑值.
dt[, lapply(.SD, function(x) sum(!is.na(x))), by = .(group1, group2)]
# group1 group2 data1 data2
#1: a x 1 2
#2: a y 1 1
#3: b y 1 1
#4: b z 3 2
Run Code Online (Sandbox Code Playgroud)