Hive - LIKE运算符

Question

Hive - LIKE运算符

我无法弄清楚我是如何处理这个问题的:

这是我的数据:

Table1:         Table2:
BRAND           PRODUCT           SOLD
Sony            Sony ABCD         1233
Apple           Sony adv          1233
Google          Sony aaaa         1233
IBM             Apple 123         1233
etc.            Apple 345         1233
                IBM 13123         1233

Run Code Online (Sandbox Code Playgroud)

是否有可能过滤查询,我有一个表格的品牌和销售总额？我的想法是:

Select table1.brand, sum(table2.sold) from table1
join table2
on (table1.brand LIKE '%table2.product%')
group by table.1.brand

Run Code Online (Sandbox Code Playgroud)

这是我的想法,但我总是得到一个错误

最大的问题是Like-Operator还是有其他解决方案吗？

Answer 1

Bra*_*zie 12

我看到两个问题:首先,蜂巢中的JOIN只能在平等条件下工作,就像在那里工作一样.

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins

Hive仅支持等连接,外连接和左半连接.Hive不支持不是平等条件的连接条件,因为很难表达诸如map/reduce作业之类的条件.

相反,它想要进入where子句.

其次,我也看到了类似语句本身的问题:'%table2.product%'被字面上解释为字符串'%table2.product%'.此外,即使这样做的目的是什么,它也会尝试在品牌内部寻找table2.product,当你似乎想要另一种方式时.要获得您想要的评估,您需要将通配符添加到table1.brand的内容中; 要实现这一点,您需要将通配符连接到表达式中.

table2.product LIKE concat('%',table1.brand,'%'))

Run Code Online (Sandbox Code Playgroud)

通过这样做,你喜欢将评估字符串'%Sony%','%Apple%'...等而不是'%table2.product%'.

你想要的是Brandon Bell的查询,我将其合并到这个答案中:

SELECT table1.brand, SUM(table2.sold) 
FROM table1, table2
WHERE table2.product LIKE concat('%', table1.brand, '%') 
GROUP BY table1.brand;

Run Code Online (Sandbox Code Playgroud)

Answer 2

bra*_*ell 6

你应该能够在没有JOIN的情况下完成这个任务.请参阅以下查询:

SELECT table1.brand, sum(table2.sold) 
FROM table1, table2 
WHERE table2.product LIKE concat('%', table1.brand, '%') 
GROUP BY table1.brand;

Run Code Online (Sandbox Code Playgroud)

这回来了

Apple   2466
IBM     1233
Sony    3699

Run Code Online (Sandbox Code Playgroud)

我的输入文件如下:

Sony
Apple
Google
IBM

Run Code Online (Sandbox Code Playgroud)

和

Sony ABCD       1233
Sony adv        1233
Sony aaaa       1233
Apple 123       1233
Apple 345       1233
IBM 13123       1233

Run Code Online (Sandbox Code Playgroud)

只是为了澄清隐式连接是一个连接.性能方面,他们应该是一样的."FROM a,b WHERE a.ID = b.ID"是"FROM a JOIN b ON a.ID = b.ID"的语法糖.:) (3认同)

归档时间：	10 年，3 月前
查看次数：	63741 次
最近记录：	7 年，4 月前