在HIVE中减去查询

Mac*_*are 6 hive

减去查询似乎在HIVE中不起作用.

尝试过:

select x from abc 
minus 
select x from bcd ; 
Run Code Online (Sandbox Code Playgroud)

我这样做错误或减去查询是不是为HIVE定义的?如果是这样,有没有其他方法来获得结果?

Pat*_*cci 17

HQL似乎不支持MINUS运算符.看到这个相关的,虽然有点旧,资源:

http://www.quora.com/Apache-Hive/What-are-the-biggest-feature-gaps-between-HiveQL-and-SQL

您想要做什么可以使用LEFT JOINNOT EXISTS:

SELECT x
FROM abc
LEFT JOIN bcd
ON abc.x = bcd.x
WHERE bcd.x IS NULL
Run Code Online (Sandbox Code Playgroud)

编辑:以下评论NOT EXISTS不支持.

SELECT x 
FROM abc
WHERE NOT EXISTS (SELECT x FROM bcd)
Run Code Online (Sandbox Code Playgroud)


小智 7

HQL不支持减号,但你可以随时使用Patrick Tucci解决方案,当你的选择列表只包含几个字段时,它可以正常工作.在我的情况下,我想找到整个表(30多个字段)和备份副本之间的差异,以查找不同的记录.这是我的解决方案:

select <all-my-fields>, count(*)
  from (
        select <all-my-fields> from mytable
        union all
        select <all-the-fields> from mybackuptable
       ) merged_data
group by <all-my-fields>
having count(*) = 1
Run Code Online (Sandbox Code Playgroud)

现在这不是完全"减",因为来自mybackuptable的单个记录会显示在我想要的结果中.为了使它成为一个完整的"减号"等价物我添加了这个:

select <all-my-fields>
  from (
        select max(source) source, <all-my-fields>, count(*)
          from (
                select 1 source, <all-my-fields> from mytable
                union all
                select 2, source, <all-the-fields> from mybackuptable
               ) merged_data
        group by <all-my-fields>
        having count(*) = 1
       ) minus_data
 where source = 1
Run Code Online (Sandbox Code Playgroud)