我想避免使用外部列表:
list <- c("Google", "Yahoo", "Amazon")
Run Code Online (Sandbox Code Playgroud)
数据帧中在第一个时间戳(最旧的时间戳)中记录的值,如下所示:
dframe <- structure(list(id = c(1L, 1L, 1L, 1L, 2L, 2L, 2L), name = c("Google",
"Google", "Yahoo", "Amazon", "Amazon", "Google", "Amazon"), date = c("2008-11-01",
"2008-11-02", "2008-11-01", "2008-11-04", "2008-11-01", "2008-11-02",
"2008-11-03")), class = "data.frame", row.names = c(NA, -7L))
Run Code Online (Sandbox Code Playgroud)
预期的输出是这样的:
Run Code Online (Sandbox Code Playgroud)id name date 1 Google 2008-11-01 1 Yahoo 2008-11-01 1 Amazon 2008-11-04 2 Amazon 2008-11-01 2 Google 2008-11-02
如何做到这一点?
使用此功能,它仅保留每个id的第一条记录,而不保留第一次记录的列表中的每个单个值的第一条记录
library(data.table)
setDT(dframe)
date_list_first = dframe[order(date)][!duplicated(id)]
Run Code Online (Sandbox Code Playgroud)
使用data.table:
dframe = data.table(dframe)
dframe[, date := as.Date(date)]
dt = dframe[, .(date = min(date)), .(id, name)]
> dt
id name date
1: 1 Google 2008-11-01
2: 1 Yahoo 2008-11-01
3: 1 Amazon 2008-11-04
4: 2 Amazon 2008-11-01
5: 2 Google 2008-11-02
Run Code Online (Sandbox Code Playgroud)