zep*_*rus 3 sql oracle hive oracle11g hiveql
我正在尝试重写下面的查询,用内连接替换'IN'子句
select * from employee_rec er
inner join ed_claim_recd ed on er.ssn=ed.insssn and substr(er.group_rec_key,1,10) = substr(er.group_rec_key,1,10)
and ed.claim in (select claimno from cd_claim_recd cd where cd.closedt is not null and cd.closedt != '0000000' and cd.closedt >= '2130101')
and ed.insssn in (select er1.ssn from employee_rec er1 where er1.status != 'ACTIV' and trim(ER1.CLAIMNO) is null)
and er.sysind not in ('ABC,'BCD')
Run Code Online (Sandbox Code Playgroud)
以下是我可以提出的结果,但结果与之前的查询不同
select * from employee_rec er
inner join ed_claim_recd ed on er.ssn = ed.insssn and substr(er.group_rec_key, 1, 10) = substr(er.group_rec_key, 1, 10)
inner join (select claimno from cd_claim_recd cd where cd.closedt is not null and cd.closedt != '0000000' and cd.closedt >= '2130101') cr on ed.claim = cr.claimno
inner join (
select insssn from ed_claim_recd ed2
inner join (
select ssn from employee_rec er1
where
er1.status != 'ACTIV'
and trim(ER1.CLAIMNO) is null
) er2 on ed2.insssn = er2.ssn
) ed3 on ed.insssn = ed3.insssn
and er.sysind not in ('ABC', 'BCD')
Run Code Online (Sandbox Code Playgroud)
这是重写查询的正确方法还是我太过分了?另外,它是一种有效的方法来重写查询以将"IN"替换为"INNER JOIN"吗?
IN
子查询和INNER JOIN
工作方式不同.Join将从一个表中为每个连接键输出具有来自连接表的相同键的所有行.因此,如果连接表中的连接键不唯一,则Join可以复制行.IN
子查询不会重复行.
例如,如果在您的cr
加入子查询中
inner join (select claimno from cd_claim_recd cd where cd.closedt is not null and cd.closedt != '0000000' and cd.closedt >= '2130101') cr on ed.claim = cr.claimno
Run Code Online (Sandbox Code Playgroud)
该claimno
不是唯一的,与之搭配的则加入行claimno
会被复制.这是很正常的Join行为.
要避免此类重复,请通过添加DISTINCT
,row_number()
过滤group by
等确保连接键是唯一的:
inner join (select DISTINCT claimno from cd_claim_recd cd where cd.closedt is not null and cd.closedt != '0000000' and cd.closedt >= '2130101') cr on ed.claim = cr.claimno
Run Code Online (Sandbox Code Playgroud)
对于其他此类连接也是如此.
在这种情况下的结果IN
,并Join
应该是相同的.
顺便说一句,你不需要所有这些条件:
where cd.closedt is not null and cd.closedt != '0000000' and cd.closedt >= '2130101'
Run Code Online (Sandbox Code Playgroud)
因为'2130101'大于'0000000',如果cd.closedt> ='2130101',它不能为NULL. cd.closedt >= '2130101'
够了.
找到了另一个可能的问题:
and trim(ER1.CLAIMNO) is null
Run Code Online (Sandbox Code Playgroud)
在Hive中(你用@hive标签标记了你的问题)空字符串和null是两个不同的东西.
('' is not NULL) = true
在蜂巢.
我建议将其替换为and (ER.CLAIMNO is null or trim(ER1.CLAIMNO)='')
空字符串是Hive中的正常值,这就是空字符串参与连接的原因.如果您不需要连接它们,请在连接之前转换为NULL或过滤它们.
ed3
子查询包含冗余连接,它不像原始IN
子查询.
也许还有其他问题.逐个测试所有连接以找到所有连接