在"外部"data.table中使用"by-argument"来过滤"内部"data.table

Fab*_*ing 5 r data.table

我仍然有一些问题理解data.table符号.任何人都可以解释为什么以下不起作用?

我正在尝试使用日期将日期分组cut.使用的中断可以在另一个data.table中找到,并且取决于by外部"data"data.table的参数

data <- data.table(A = c(1, 1, 1, 2, 2, 2),
                   DATE = as.POSIXct(c("01-01-2012", "30-05-2015", "01-01-2020", "30-06-2012", "30-06-2013", "01-01-1999"), format = "%d-%m-%Y"))

breaks <- data.table(B = c(1, 1, 2, 2),
                     BREAKPOINT = as.POSIXct(c("01-01-2015", "01-01-2016", "30-06-2012", "30-06-2013"), format = "%d-%m-%Y"))

data[, bucket := cut(DATE, breaks[B == A, BREAKPOINT], ordered_result = T), by = A]
Run Code Online (Sandbox Code Playgroud)

我可以得到理想的结果

# expected
data[A == 1, bucket := cut(DATE, breaks[B == 1, BREAKPOINT], ordered_result = T)]
data[A == 2, bucket := cut(DATE, breaks[B == 2, BREAKPOINT], ordered_result = T)]
data 
#    A       DATE     bucket
# 1: 1 2012-01-01         NA
# 2: 1 2015-05-30 2015-01-01
# 3: 1 2020-01-01         NA
# 4: 2 2012-06-30 2012-06-30
# 5: 2 2013-06-30         NA
# 6: 2 1999-01-01         NA
Run Code Online (Sandbox Code Playgroud)

谢谢,迈克尔

edd*_*ddi 5

问题是,cut生产因素和那些不被在正确处理data.table by操作(这是一个错误,应报告-因子水平应该如何处理它们在相同的方式处理rbind.data.tablerbindlist).对原始表达式的一个简单修复就是转换为字符:

data[, bucket := as.character(cut(DATE, breaks[B == A, BREAKPOINT], ordered_result = T))
     , by = A]
#   A       DATE     bucket
#1: 1 2012-01-01         NA
#2: 1 2015-05-30 2015-01-01
#3: 1 2020-01-01         NA
#4: 2 2012-06-30 2012-06-30
#5: 2 2013-06-30         NA
#6: 2 1999-01-01         NA
Run Code Online (Sandbox Code Playgroud)

  • 我认为这与[#967](https://github.com/Rdatatable/data.table/issues/967)有关. (2认同)