R:data.table交叉连接不起作用

tch*_*rty 5 join r data.table

我有两个data.table我想加入的(形成笛卡尔积).其中一个data.tables键入一个Date向量,另一个键入numeric向量:

# data.table with dates (as numeric)
dtDates2 = data.table(date = 
                       as.numeric(seq(from = as.Date('2014/01/01'), 
                           to = as.Date('2014/07/01'), by = 'weeks')),
                     data1 = rnorm(26))

# data.table with dates
dtDates1 = data.table(date = 
                        seq(from = as.Date('2014/01/01'), 
                            to = as.Date('2014/07/01'), by = 'weeks'),
                      data1 = rnorm(26))


# data.table with customer IDs
dtCustomers = data.table(customerID = seq(1, 100),
                      data2 = rnorm(100))
Run Code Online (Sandbox Code Playgroud)

setkey尝试使用CJ以下方式交叉加入它们:

# cross join the two datatables
setkey(dtCustomers, customerID)
setkey(dtDates1, date)
setkey(dtDates2, date)

CJ(dtCustomers, dtDates1)
CJ(dtCustomers, dtDates2)
Run Code Online (Sandbox Code Playgroud)

但是得到以下错误:

Error in FUN(X[[1L]], ...) : 
  Invalid column: it has dimensions. Can't format it. If it's the result of data.table(table()), use as.data.table(table()) instead.
Run Code Online (Sandbox Code Playgroud)

不确定我做错了什么.

jan*_*cki 17

data.table开箱即用的交叉连接功能没有.
然而,有一个CJ.dt功能(CJ类似但设计用于data.tables)来实现optiRum包中可用的笛卡尔积(交叉连接)(在CRAN中可用).
您可以创建该功能:

CJ.dt = function(X,Y) {
  stopifnot(is.data.table(X),is.data.table(Y))
  k = NULL
  X = X[, c(k=1, .SD)]
  setkey(X, k)
  Y = Y[, c(k=1, .SD)]
  setkey(Y, NULL)
  X[Y, allow.cartesian=TRUE][, k := NULL][]
}
CJ.dt(dtCustomers, dtDates1)
CJ.dt(dtCustomers, dtDates2)
Run Code Online (Sandbox Code Playgroud)

然而,有一个FR方便的方式来执行填充data.table#1717的交叉连接,所以你可以检查那里是否有一个更好的交叉连接api.