Hive Query-使用OR运算符在三个连接条件上连接两个表

Sud*_*har 11 hive

我正面临一个错误

" FAILED: Error in semantic analysis: Line 1:101 OR not supported in JOIN currently dob"

在运行下面提到的查询时..

Insert Overwrite Local Directory './Insurance_Risk/Merged_Data' Select f.name,s.age,f.gender,f.loc,f.marital_status,f.habits1,f.habits2,s.employement_status,s.occupation_class,s.occupation_subclass,s.occupation from sample_member_detail s Join fb_member_detail f 
On s.email=f.email or 
s.dob=f.dob 
or (f.name=s.name and f.loc = s.loc and f.occupation=s.occupation)
where s.email is not null and f.email is not null;
Run Code Online (Sandbox Code Playgroud)

任何人都可以告诉我,在蜂巢" OR"操作符可以使用与否?如果不是,那么查询将是什么,它将给出与上述查询给出的相同的结果.我有2个表,我想在或运算符的三个条件中的任何一个上加入两个表.请帮忙..

www*_*www 9

抱歉,Hive仅支持equi-joins.您可以随时尝试从这些表的完整笛卡尔积中选择(您必须处于非严格模式):

Select f.name,s.age,f.gender,f.loc,f.marital_status,f.habits1,f.habits2,s.employement_status,s.occupation_class,s.occupation_subclass,s.occupation 
from sample_member_detail s join fb_member_detail f 
where (s.email=f.email 
or s.dob=f.dob 
or (f.name=s.name and f.loc = s.loc and f.occupation=s.occupation))
and s.email is not null and f.email is not null;
Run Code Online (Sandbox Code Playgroud)


Mat*_*aus 6

您还可以使用UNION获得相同的结果:

INSERT OVERWRITE LOCAL DIRECTORY './Insurance_Risk/Merged_Data' 
-- You can only UNION on subqueries
SELECT * FROM (
    SELECT f.name,
        s.age,
        f.gender,
        f.loc,
        f.marital_status,
        f.habits1,
        f.habits2,
        s.employement_status,
        s.occupation_class,
        s.occupation_subclass,
        s.occupation 
    FROM sample_member_detail s 
    JOIN fb_member_detail f 
    ON s.email=f.email 
    WHERE s.email IS NOT NULL AND f.email IS NOT NULL;

    UNION

    SELECT f.name,
        s.age,
        f.gender,
        f.loc,
        f.marital_status,
        f.habits1,
        f.habits2,
        s.employement_status,
        s.occupation_class,
        s.occupation_subclass,
        s.occupation 
    FROM sample_member_detail s 
    JOIN fb_member_detail f 
    ON s.dob=f.dob
    WHERE s.email IS NOT NULL AND f.email IS NOT NULL;

    UNION

    SELECT f.name,
        s.age,
        f.gender,
        f.loc,
        f.marital_status,
        f.habits1,
        f.habits2,
        s.employement_status,
        s.occupation_class,
        s.occupation_subclass,
        s.occupation 
    FROM sample_member_detail s 
    JOIN fb_member_detail f 
    ON f.name=s.name AND f.loc = s.loc AND f.occupation=s.occupation
    WHERE s.email IS NOT NULL AND f.email IS NOT NULL;

) subquery;
Run Code Online (Sandbox Code Playgroud)

  • 您必须在外层添加_distinct_才能获得相同的结果。否则,您将获得满足多个条件的行的重复项。 (2认同)