Zac*_*ite 15 r data.table
使用R中的data.table包,我试图使用merge方法创建两个data.tables的笛卡尔积,就像在基数R中一样.
在基础上,以下工作:
#assume this order data
orders <- data.frame(date = as.POSIXct(c('2012-08-28','2012-08-29','2012-09-01')),
first.name = as.character(c('John','George','Henry')),
last.name = as.character(c('Doe','Smith','Smith')),
qty = c(10,50,6))
#and these dates
dates <- data.frame(date = seq(from = as.POSIXct('2012-08-28'),
to = as.POSIXct('2012-09-07'), by = 'day'))
#get the unique customers
cust<-unique(orders[,c('first.name','last.name')])
#using merge from base R, get the cartesian product
merge(dates, cust, by = integer(0))
Run Code Online (Sandbox Code Playgroud)
但是,使用data.table相同的技术不起作用,并抛出此错误:
"merge.data.table中的错误(dates.dt,cust.dt,by = integer(0)):需要一个非空的列名向量__CODE__."
"Error in merge.data.table(dates.dt, cust.dt, by = integer(0)) :
A non-empty vector of column names for `by` is required."
Run Code Online (Sandbox Code Playgroud)
我希望结果反映所有日期的所有客户名称,就像在base中一样,但是以data.table为中心的方式.这可能吗?
42-*_*42- 12
如果从数据框中的第一个和最后一个构造全名,则可以使用CJ(交叉连接).你不能使用所有三个向量,因为有99个项目.
> nrow(CJ(dates$date, cust$first.name, cust$last.name ) )
[1] 99
Run Code Online (Sandbox Code Playgroud)
这将返回一个data.table对象:
> CJ(dates$date,paste(cust$first.name, cust$last.name) )
V1 V2
1: 2012-08-28 George Smith
2: 2012-08-28 Henry Smith
3: 2012-08-28 John Doe
4: 2012-08-29 George Smith
5: 2012-08-29 Henry Smith
6: 2012-08-29 John Doe
7: 2012-08-30 George Smith
8: 2012-08-30 Henry Smith
9: 2012-08-30 John Doe
10: 2012-08-31 John Doe
11: 2012-08-31 George Smith
12: 2012-08-31 Henry Smith
13: 2012-09-01 John Doe
14: 2012-09-01 George Smith
15: 2012-09-01 Henry Smith
16: 2012-09-02 George Smith
17: 2012-09-02 Henry Smith
18: 2012-09-02 John Doe
19: 2012-09-03 Henry Smith
20: 2012-09-03 John Doe
21: 2012-09-03 George Smith
22: 2012-09-04 Henry Smith
23: 2012-09-04 John Doe
24: 2012-09-04 George Smith
25: 2012-09-05 George Smith
26: 2012-09-05 Henry Smith
27: 2012-09-05 John Doe
28: 2012-09-06 George Smith
29: 2012-09-06 Henry Smith
30: 2012-09-06 John Doe
31: 2012-09-07 George Smith
32: 2012-09-07 Henry Smith
33: 2012-09-07 John Doe
V1 V2
Run Code Online (Sandbox Code Playgroud)
merge.data.table(x, y)是一个包含调用的便捷函数x[y],因此合并需要基于两个列中的列data.table.(这就是错误消息试图告诉你的).
一种解决方法是向两个data.tables添加一个虚拟列,其唯一目的是使合并成为可能:
## Add a column "k", and append it to each data.table's vector of keyed columns.
setkeyv(cust.dt[,k:=1], c(key(cust.dt), "k"))
setkeyv(dates.dt[,k:=1], c(key(dates.dt), "k"))
## Merge and then remove the dummy column
res <- merge(dates.dt, cust.dt, by="k")
head(res[,k:=NULL])
# date first.name last.name
# 1: 2012-08-28 George Smith
# 2: 2012-08-28 Henry Smith
# 3: 2012-08-28 John Doe
# 4: 2012-08-29 George Smith
# 5: 2012-08-29 Henry Smith
# 6: 2012-08-29 John Doe
## Maybe also clean up cust.dt and dates.dt
# cust.dt[,k:=NULL]
# dates.dt[,k=NULL]
Run Code Online (Sandbox Code Playgroud)