如果groupby参数落入一个间隔(在R中非等联接),则合并两个表

Art*_*Sbr 0 merge r data.table

我有以下两个表:

df <- data.table(id = c("01","02","03"), tariff = c("1A","1B","1A"), summer = c(0,0,1), expenditure = c(150,200,90))
   id tariff summer expenditure
1: 01     1A      0         150
2: 02     1B      0         200
3: 03     1A      1          90

catalogue <- data.table(tariff = c("1A","1A","1A","1A","1B","1B","1B","1B"), summer = c(0,0,1,1,0,0,1,1),
                        lb_quant = c(0,50,0,80,0,80,0,100), ub_quant = c(50,Inf,80,Inf,80,Inf,100,Inf), case = letters[1:8])
   tariff summer lb_quant ub_quant case
1:     1A      0        0       50    a
2:     1A      0       50      Inf    b
3:     1A      1        0       80    c
4:     1A      1       80      Inf    d
5:     1B      0        0       80    e
6:     1B      0       80      Inf    f
7:     1B      1        0      100    g
8:     1B      1      100      Inf    h
Run Code Online (Sandbox Code Playgroud)

我想合并dfcatalogue通过tariffsummerexpenditure。但是,由于expenditure是数字,因此合并将无法直接进行。

我正在寻找一种向量化的方式来将两个表合并在一起,如果:

  1. tariffsummer匹配
  2. catalogue$lb_quant < df$expenditure <= catalogue$ub_quant

作为一个例子,我想匹配df[id == "01"]与第二行catalogue,因为tariff == "01"summer == 0expenditure瀑布内[50, inf)。因此分配case = bdf[id = "01"]

df数很大,我想避免使用循环。是否有矢量化的方法可以在R或Python中实现?

Wim*_*pel 5

在这种情况下,您也可以使用非等价更新联接。

请参阅以下单行代码(增加了换行符以提高可读性)

df[ catalogue, 
    `:=`( lb_quant = i.lb_quant, 
          ub_quant= i.ub_quant, 
          case = i.case ),
    on = .( tariff, 
            summer, 
            expenditure > lb_quant, 
            expenditure < ub_quant ) ][]
Run Code Online (Sandbox Code Playgroud)

输出

   id tariff summer expenditure lb_quant ub_quant case
1: 01     1A      0         150       50      Inf    b
2: 02     1B      0         200       80      Inf    f
3: 03     1A      1          90       80      Inf    d
Run Code Online (Sandbox Code Playgroud)

  • 内联运算符(例如,“%in%”,“ &lt;-”,“ units &lt;-”,是的“:=”)只是特殊功能。将内联运算符用作“常规函数”的方法是将其括在反引号中。所以`df [,a:= 1]`等同于`df [,\`:= \`(“ a”,1)]]`(在data.table中,实际上不需要引号,但除此之外,您通常会这样做)。 (3认同)