CREATE EXTERNAL TABLE old_events
(day STRING, foo STRING, count STRING, internal_id STRING)
PARTITIONED BY (ds string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LOCATION '${INPUT}';
CREATE EXTERNAL TABLE events
(internal_id, foo STRING, count STRING)
PARTITIONED BY (ds string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LOCATION '${OUTPUT}';
INSERT OVERWRITE TABLE events
SELECT e2.internal_id, e2.foo, count(e1.foo)
FROM old_events e2
LEFT OUTER JOIN old_events e1
ON e1.foo = e2.foo
WHERE e1.event = 'event1'
AND e2.event = 'event2'
AND ds = date_sub('${DAY}',1)
GROUP BY e2.internal_id, e2.foo;
Run Code Online (Sandbox Code Playgroud)
FAILED:语义分析出错:列ds在多个表/子查询中找到
我在添加获取当前日期的ds变量时收到此错误.我如何使用上面的脚本实现日期分区.
您需要ds在WHERE子句中添加别名.前,ds = date_sub('${DAY}',1)到e2.ds = date_sub('${DAY}',1).
为了澄清您的问题,这里有一个较小的例子,它显示了相同的行为
CREATE EXTERNAL TABLE example
(a INT, b INT)
LOCATION '${OUTPUT}';
SELECT *
FROM example e1
JOIN example e2
ON e1.a = e2.a
WHERE b = 5;
Run Code Online (Sandbox Code Playgroud)
这会产生同样的错误:
FAILED: SemanticException Column b Found in more than One Tables/Subqueries
Run Code Online (Sandbox Code Playgroud)
问题是列b存在两个example别名为e1和e2.你和我可能会知道,如果你加入example的本身列a则e1.b是一样的e2.b,因此不应该需要的别名,但蜂房不知道这个,所以你需要选择一个消除任何含糊.这里是否b是分区列并不重要.
| 归档时间: |
|
| 查看次数: |
13293 次 |
| 最近记录: |