我需要确定样本数据中每个最小的value (忽略NA),class并在新列中将其标记为'min',如下所示data.table
样本数据:
df = structure(list(class = c("apple", "apple", "apple", "banana",
"banana", "berry", "berry", "grape", "grape", "grape", "grape",
"grape", "melon", "melon", "melon"), value = c(108816872, 108851837,
108890411, 108784778, NA, 108784778, 108816872, 108816872, 108850460,
NA, NA, NA, NA, NA, NA)), .Names = c("class", "value"), class = "data.frame", row.names = c(NA,
-15L))
Run Code Online (Sandbox Code Playgroud)
期望的输出:
# class value anno
#1 apple 108816872 min
#2 apple 108851837 NA
#3 apple 108890411 NA
#4 banana 108784778 min
#5 banana NA NA
#6 berry 108784778 min
#7 berry 108816872 NA
#8 grape 108816872 min
#9 grape 108850460 NA
#10 grape NA NA
#11 grape NA NA
#12 grape NA NA
#13 melon NA NA
#14 melon NA NA
#15 melon NA NA
Run Code Online (Sandbox Code Playgroud)
我打算建议@eddies方法,但这是另一种选择
setDT(df)[order(value), min := c("min", rep(NA, .N - 1)), by = class]
Run Code Online (Sandbox Code Playgroud)
编辑,如果你想要实际值而不是"min",你可以修改为
setDT(df)[order(value), min := c(value[1L], rep(NA, .N - 1L)), by = class]
Run Code Online (Sandbox Code Playgroud)
dt = as.data.table(df) # or convert in place using setDT
dt[dt[, .I[which.min(value)], by = class]$V1, anno := 'min']
Run Code Online (Sandbox Code Playgroud)