重新排列data.frame以获得产品的顺序

grr*_*bla 5 r date

我有一个以下形式的数据框:

df <- data.frame(client = c("client1", "client1", "client2", "client3", "client3"),
                 product = c("A", "B", "A", "D", "A"),
                 purchase_Date = c("2010-03-22", "2010-02-02", "2009-03-02", "2011-04-05", "2012-11-01"))
df$purchase_Date <- as.Date(df$purchase_Date, format = "%Y-%m-%d")
Run Code Online (Sandbox Code Playgroud)

看起来像这样:

   client product purchase_Date
1 client1       A    2010-03-02
2 client1       B    2010-02-02
3 client2       A    2009-03-02
4 client3       D    2011-04-05
5 client3       A    2012-11-01
Run Code Online (Sandbox Code Playgroud)

我想像这样重新排列:

   client purchase1 purchase2
1 client1         B         A
2 client2         A      <NA>
3 client3         D         A
Run Code Online (Sandbox Code Playgroud)

所以我想知道哪个产品是第一个,第二个,第三个等等,每个人都是按购买日期订购的.我可以使用data.table轻松地分别获取每一个:

library(data.table)
setDT(df)[ , .SD[order(-purchase_Date), product][1], by = client]
Run Code Online (Sandbox Code Playgroud)

对于第一个.但我不知道如何有效地获得所需的输出.

Dav*_*urg 7

这是一个可能的data.table解决方案(如果你有超过10个购买,那么我建议避免使用,paste0而只是使用indx := seq_len(.N)它,因为它可能会搞乱购买订单)

setDT(df)[order(purchase_Date), indx := paste0("purchase", seq_len(.N)), by = client]
dcast(df, client ~ indx, value.var = "product")
#     client purchase1 purchase2
# 1: client1         B         A
# 2: client2         A        NA
# 3: client3         D         A
Run Code Online (Sandbox Code Playgroud)

创建col的比较frank()order()方法indx:

require(data.table)
set.seed(45L); 
dt = data.table(client = sample(paste("client", 1:1e4, sep=""), 1e6, TRUE))
dt[, `:=`(product = sample(paste("p", 1:200, sep=""), .N, FALSE), 
          purchase_Date = as.Date(sample(14610:16586, .N, FALSE), 
           origin = "1970-01-01")), by=client]

system.time(dt[order(purchase_Date), indx := seq_len(.N), by = client])
# user  system elapsed 
# 0.19    0.02    0.20 
system.time(dt[, purch_rank := frank(purchase_Date, ties.method = "dense"), by=client])
# user  system elapsed 
# 3.94    0.00    3.98 
Run Code Online (Sandbox Code Playgroud)

  • 只需对10000行客户进行基准测试 - "frank = 2.6s"vs"order()= 0.5s`.添加[FR#1197](https://github.com/Rdatatable/data.table/issues/1197). (4认同)