Keep values from a list based on the first timestamp record

Nat*_*lie 2 r date filter

我想避免使用外部列表:

list <- c("Google", "Yahoo", "Amazon")
Run Code Online (Sandbox Code Playgroud)

数据帧中在第一个时间戳(最旧的时间戳)中记录的值,如下所示:

dframe <- structure(list(id = c(1L, 1L, 1L, 1L, 2L, 2L, 2L), name = c("Google", 
    "Google", "Yahoo", "Amazon", "Amazon", "Google", "Amazon"), date = c("2008-11-01", 
    "2008-11-02", "2008-11-01", "2008-11-04", "2008-11-01", "2008-11-02", 
    "2008-11-03")), class = "data.frame", row.names = c(NA, -7L))
Run Code Online (Sandbox Code Playgroud)

预期的输出是这样的:

id   name       date
1 Google 2008-11-01
1  Yahoo 2008-11-01
1 Amazon 2008-11-04
2 Amazon 2008-11-01
2 Google 2008-11-02
Run Code Online (Sandbox Code Playgroud)

如何做到这一点?

使用此功能,它仅保留每个id的第一条记录,而不保留第一次记录的列表中的每个单个值的第一条记录

library(data.table)
setDT(dframe)
date_list_first = dframe[order(date)][!duplicated(id)]
Run Code Online (Sandbox Code Playgroud)

JDG*_*JDG 5

使用data.table:

dframe = data.table(dframe)
dframe[, date := as.Date(date)]

dt = dframe[, .(date = min(date)), .(id, name)]

> dt
   id   name       date
1:  1 Google 2008-11-01
2:  1  Yahoo 2008-11-01
3:  1 Amazon 2008-11-04
4:  2 Amazon 2008-11-01
5:  2 Google 2008-11-02
Run Code Online (Sandbox Code Playgroud)