在Impala/Hive中删除多个分区

k_m*_*hap 3 sql hive partitioning hdfs impala

1-我试图一次删除多个分区,但努力用Impala或Hive来做.我尝试了以下查询,有和没有':

ALTER TABLE cz_prd_corrti_st.s1mme_transstats_info DROP IF EXISTS PARTITION (pr_load_time='20170701000317') PARTITION (pr_load_time='20170701000831')

我得到的错误如下:

AnalysisException:第3行中的语法错误:PARTITION(pr_load_time ='20170701000831')^遇到:PARTITION预期:CACHED,LOCATION,PURGE,SET,UNCACHED CAUSED BY:异常:语法错误

分区列是bigint类型,只删除一个分区的查询按预期工作:

ALTER TABLE cz_prd_corrti_st.s1mme_transstats_info DROP IF EXISTS
PARTITION   (pr_load_time='20170701000317')
Run Code Online (Sandbox Code Playgroud)

2-首先删除hdfs数据然后删除Impala/Hive中的分区,或者它应该是反之亦然?

Dav*_*itz 9

1.

你的语法错了.
在DROP命令中,分区应以逗号分隔.

演示

hive> create table t (i int) partitioned by (p int);
OK

hive> alter table t add partition (p=1) partition(p=2) partition(p=3) partition(p=4) partition(p=5);
OK

hive> show partitions t;
OK
partition
p=1
p=2
p=3
p=4
p=5

hive> alter table t drop if exists partition (p=1),partition (p=2),partition(p=3);
Dropped the partition p=1
Dropped the partition p=2
Dropped the partition p=3
OK

hive> show partitions t;
OK
partition
p=4
p=5
Run Code Online (Sandbox Code Playgroud)

2.

你可以放弃一个范围.

演示

hive> create table t (i int) partitioned by (p int);
OK

hive> alter table t add partition (p=1) partition(p=2) partition(p=3) partition(p=4) partition(p=5);
OK

hive> show partitions t;
OK
partition
p=1
p=2
p=3
p=4
p=5

hive> alter table t drop if exists partition (p<=3);
Dropped the partition p=1
Dropped the partition p=2
Dropped the partition p=3
OK

hive> show partitions t;
OK
partition
p=4
p=5
Run Code Online (Sandbox Code Playgroud)

  • 不幸的是,分区谓词 (`partition (p&lt;=3)`) 中的比较器在 Spark SQL 中不起作用,请参阅 https://issues.apache.org/jira/browse/SPARK-14922 (2认同)