Art*_*Sbr 0 merge r data.table
我有以下两个表:
df <- data.table(id = c("01","02","03"), tariff = c("1A","1B","1A"), summer = c(0,0,1), expenditure = c(150,200,90))
   id tariff summer expenditure
1: 01     1A      0         150
2: 02     1B      0         200
3: 03     1A      1          90
catalogue <- data.table(tariff = c("1A","1A","1A","1A","1B","1B","1B","1B"), summer = c(0,0,1,1,0,0,1,1),
                        lb_quant = c(0,50,0,80,0,80,0,100), ub_quant = c(50,Inf,80,Inf,80,Inf,100,Inf), case = letters[1:8])
   tariff summer lb_quant ub_quant case
1:     1A      0        0       50    a
2:     1A      0       50      Inf    b
3:     1A      1        0       80    c
4:     1A      1       80      Inf    d
5:     1B      0        0       80    e
6:     1B      0       80      Inf    f
7:     1B      1        0      100    g
8:     1B      1      100      Inf    h
我想合并df和catalogue通过tariff,summer和expenditure。但是,由于expenditure是数字,因此合并将无法直接进行。
我正在寻找一种向量化的方式来将两个表合并在一起,如果:
tariff并summer匹配catalogue$lb_quant < df$expenditure <= catalogue$ub_quant作为一个例子,我想匹配df[id == "01"]与第二行catalogue,因为tariff == "01"与summer == 0和expenditure瀑布内[50, inf)。因此分配case = b给df[id = "01"]。
实df数很大,我想避免使用循环。是否有矢量化的方法可以在R或Python中实现?
在这种情况下,您也可以使用非等价更新联接。
请参阅以下单行代码(增加了换行符以提高可读性)
df[ catalogue, 
    `:=`( lb_quant = i.lb_quant, 
          ub_quant= i.ub_quant, 
          case = i.case ),
    on = .( tariff, 
            summer, 
            expenditure > lb_quant, 
            expenditure < ub_quant ) ][]
输出
   id tariff summer expenditure lb_quant ub_quant case
1: 01     1A      0         150       50      Inf    b
2: 02     1B      0         200       80      Inf    f
3: 03     1A      1          90       80      Inf    d