如何在Oracle SQL中用JOIN替换IN子句?

zep*_*rus 3 sql oracle hive oracle11g hiveql

我正在尝试重写下面的查询,用内连接替换'IN'子句

select * from employee_rec er 
  inner join ed_claim_recd ed on er.ssn=ed.insssn and substr(er.group_rec_key,1,10) = substr(er.group_rec_key,1,10) 
    and ed.claim in (select claimno from cd_claim_recd cd where cd.closedt is not null and cd.closedt != '0000000' and cd.closedt >= '2130101')
    and ed.insssn in (select er1.ssn from employee_rec er1 where er1.status != 'ACTIV' and trim(ER1.CLAIMNO) is null)
    and er.sysind not in ('ABC,'BCD')
Run Code Online (Sandbox Code Playgroud)

以下是我可以提出的结果,但结果与之前的查询不同

select * from employee_rec er
  inner join ed_claim_recd ed on er.ssn = ed.insssn and substr(er.group_rec_key, 1, 10) = substr(er.group_rec_key, 1, 10)
  inner join (select claimno from cd_claim_recd cd where cd.closedt is not null and cd.closedt != '0000000' and cd.closedt >= '2130101') cr on ed.claim = cr.claimno
  inner join (
    select insssn from ed_claim_recd ed2
      inner join (
        select ssn from employee_rec er1
        where
          er1.status != 'ACTIV'
          and trim(ER1.CLAIMNO) is null
      ) er2 on ed2.insssn = er2.ssn
  ) ed3 on ed.insssn = ed3.insssn
  and er.sysind not in ('ABC', 'BCD')
Run Code Online (Sandbox Code Playgroud)

这是重写查询的正确方法还是我太过分了?另外,它是一种有效的方法来重写查询以将"IN"替换为"INNER JOIN"吗?

lef*_*oin 5

IN子查询和INNER JOIN工作方式不同.Join将从一个表中为每个连接键输出具有来自连接表的相同键的所有行.因此,如果连接表中的连接键不唯一,则Join可以复制行.IN子查询不会重复行.

例如,如果在您的cr加入子查询中

inner join (select claimno from cd_claim_recd cd where cd.closedt is not null and cd.closedt != '0000000' and cd.closedt >= '2130101') cr on ed.claim = cr.claimno
Run Code Online (Sandbox Code Playgroud)

claimno不是唯一的,与之搭配的则加入行claimno会被复制.这是很正常的Join行为.

要避免此类重复,请通过添加DISTINCT,row_number()过滤group by等确保连接键是唯一的:

inner join (select DISTINCT claimno from cd_claim_recd cd where cd.closedt is not null and cd.closedt != '0000000' and cd.closedt >= '2130101') cr on ed.claim = cr.claimno
Run Code Online (Sandbox Code Playgroud)

对于其他此类连接也是如此.

在这种情况下的结果IN,并Join应该是相同的.

顺便说一句,你不需要所有这些条件:

where cd.closedt is not null and cd.closedt != '0000000' and cd.closedt >= '2130101'
Run Code Online (Sandbox Code Playgroud)

因为'2130101'大于'0000000',如果cd.closedt> ='2130101',它不能为NULL. cd.closedt >= '2130101'够了.

找到了另一个可能的问题:

and trim(ER1.CLAIMNO) is null
Run Code Online (Sandbox Code Playgroud)

在Hive中(你用@hive标签标记了你的问题)空字符串和null是两个不同的东西.

('' is not NULL) = true 在蜂巢.

我建议将其替换为and (ER.CLAIMNO is null or trim(ER1.CLAIMNO)='') 空字符串是Hive中的正常值,这就是空字符串参与连接的原因.如果您不需要连接它们,请在连接之前转换为NULL或过滤它们.

ed3子查询包含冗余连接,它不像原始IN子查询.

也许还有其他问题.逐个测试所有连接以找到所有连接