Max*_*nis 5 r duplicates data.table
我想从data.table获取第一行,按多列分组.
这对于单个列是直截了当的,例如:
(dt <- data.table(x = c(1, 1, 1, 2),
y = c(1, 1, 2, 2),
z = c(1, 2, 1, 2)))
# x y z
# |1: 1 1 1
# |2: 1 1 2
# |3: 1 2 1
# |4: 2 2 2
dt[!duplicated(x)] # Remove rows 2-3
# x y z
# |1: 1 1 1
# |2: 2 2 2
Run Code Online (Sandbox Code Playgroud)
但是,当尝试基于两列删除时,这些方法都不起作用; 即在这种情况下只删除第2行:
dt[!duplicated(x, y)] # Keeps only original data set
# x y z
# |1: 1 1 1
# |2: 1 1 2
# |3: 1 2 1
# |4: 2 2 2
dt[!duplicated(list(x, y))] # Same as above
dt[!duplicated(c("x", "y"))] # Same as above
dt[!duplicated(list("x", "y"))] # Same as above
dt[!duplicated(c(x, y))] # Only removes duplicates from first column
# x y z
# |1: 1 1 1
# |2: 2 2 2
Run Code Online (Sandbox Code Playgroud)
除此之外,仅在某些情况下有效:
dt[!duplicated(paste0(x, y))]
# x y z
# |1: 1 1 1
# |2: 1 2 1
# |3: 2 2 2
Run Code Online (Sandbox Code Playgroud)
mne*_*nel 13
data.table提供S3方法unique,duplicated和anyDuplicated
unique(dt, by = c('x','y'))
Run Code Online (Sandbox Code Playgroud)
会给你你想要的.
data.table不duplicated按键。来自?duplicated.data.table:
‘duplicated’ returns a logical vector indicating which rows of a
‘data.table’ have duplicate rows (by key).
Run Code Online (Sandbox Code Playgroud)
setkey(dt, x, y)
dt[!duplicated(dt)]
## x y z
## 1: 1 1 1
## 2: 1 2 1
## 3: 2 2 2
Run Code Online (Sandbox Code Playgroud)