我可以通过/ bucket在Hive中通过"CREATE TABLE AS SELECT ....."创建一个表吗?

And*_*rew 8 hadoop hive bucket hiveql hadoop-partitioning

我想在Hive中创建一个表

CREATE TABLE BUCKET_TABLE AS 
SELECT a.* FROM TABLE1 a LEFT JOIN TABLE2 b ON (a.key=b.key) WHERE b.key IS NUll
CLUSTERED BY (key) INTO 1000 BUCKETS;
Run Code Online (Sandbox Code Playgroud)

这种语法失败了 - 但我不确定是否有可能做这个组合语句.有任何想法吗?

Neb*_*tic 15

遇到这个问题,看到没有提供答案.我进一步观察并在Hive文档中找到了答案.

这将永远不会有效,因为CTAS有以下限制:

  1. 目标表不能是分区表.
  2. 目标表不能是外部表.
  3. 目标表不能是列表存储表.

资料来源:https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTableAsSelect%28CTAS

此外 https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL

CREATE [TEMPORARY] [外部] TABLE [IF NOT EXISTS] [DB_NAME.] table_name的
......
[CLUSTERED BY(COL_NAME,COL_NAME,...)[排序方式(COL_NAME [ASC | DESC],...)] INTO num_buckets BUCKETS]
...
[AS select_statement];

群集需要定义列,然后cfg转到As select_statement因此此时不可能.

(可选)您可以更改表并添加存储桶,但这不会更改现有数据.

CREATE TABLE BUCKET_TABLE 
STORED AS ORC AS 
SELECT a.* FROM TABLE1 a LEFT JOIN TABLE2 b ON (a.key=b.key) WHERE b.key IS NUll limit 0;
ALTER TABLE BUCKET_TABLE CLUSTERED BY (key) INTO 1000 BUCKETS;
ALTER TABLE BUCKET_TABLE SET TBLPROPERTIES ('transactional'='true');
INSERT INTO BUCKET_TABLE 
SELECT a.* FROM TABLE1 a LEFT JOIN TABLE2 b ON (a.key=b.key) WHERE b.key IS NUll;
Run Code Online (Sandbox Code Playgroud)