Apache hive MSCK REPAIR TABLE未添加新分区

Gre*_*een 6 hadoop hive mapreduce apache-hive

我是Apache Hive的新手.在处理外部表分区时,如果我将新分区直接添加到HDFS,则在运行MSCK REPAIR表后不会添加新分区.以下是我试过的代码,

- 创建外部表

hive> create external table factory(name string, empid int, age int) partitioned by(region string)  
    > row format delimited fields terminated by ','; 
Run Code Online (Sandbox Code Playgroud)

- 详细表信息

Location:  hdfs://localhost.localdomain:8020/user/hive/warehouse/factory     
Table Type:             EXTERNAL_TABLE           
Table Parameters:        
    EXTERNAL                TRUE                
    transient_lastDdlTime   1438579844  
Run Code Online (Sandbox Code Playgroud)

- 在HDFS中创建目录以加载表工厂的数据

[cloudera@localhost ~]$ hadoop fs -mkdir 'hdfs://localhost.localdomain:8020/user/hive/testing/testing1/factory1'
[cloudera@localhost ~]$ hadoop fs -mkdir 'hdfs://localhost.localdomain:8020/user/hive/testing/testing1/factory2'
Run Code Online (Sandbox Code Playgroud)

- 表数据

cat factory1.txt
emp1,500,40
emp2,501,45
emp3,502,50

cat factory2.txt
EMP10,200,25
EMP11,201,27
EMP12,202,30
Run Code Online (Sandbox Code Playgroud)

- 从本地复制到HDFS

[cloudera@localhost ~]$ hadoop fs -copyFromLocal '/home/cloudera/factory1.txt' 'hdfs://localhost.localdomain:8020/user/hive/testing/testing1/factory1'
[cloudera@localhost ~]$ hadoop fs -copyFromLocal '/home/cloudera/factory2.txt' 'hdfs://localhost.localdomain:8020/user/hive/testing/testing1/factory2'
Run Code Online (Sandbox Code Playgroud)

- 改变表以在Metastore中更新

hive> alter table factory add partition(region='southregion') location '/user/hive/testing/testing1/factory2';
hive> alter table factory add partition(region='northregion') location '/user/hive/testing/testing1/factory1';            
hive> select * from factory;                                                                      
OK
emp1    500 40  northregion
emp2    501 45  northregion
emp3    502 50  northregion
EMP10   200 25  southregion
EMP11   201 27  southregion
EMP12   202 30  southregion
Run Code Online (Sandbox Code Playgroud)

现在我创建了新的文件factory3.txt来添加为表工厂的新分区

cat factory3.txt
user1,100,25
user2,101,27
user3,102,30
Run Code Online (Sandbox Code Playgroud)

- 创建路径并复制表数据

[cloudera@localhost ~]$ hadoop fs -mkdir 'hdfs://localhost.localdomain:8020/user/hive/testing/testing1/factory2'
[cloudera@localhost ~]$ hadoop fs -copyFromLocal '/home/cloudera/factory3.txt' 'hdfs://localhost.localdomain:8020/user/hive/testing/testing1/factory3'
Run Code Online (Sandbox Code Playgroud)

现在我执行了以下查询来更新添加的新分区的Metastore

MSCK REPAIR TABLE factory;
Run Code Online (Sandbox Code Playgroud)

现在该表没有给出factory3文件的新分区内容.在为工厂工厂添加分区时,我可以知道我在哪里做错吗?

然而,如果我运行alter命令,那么它将显示新的分区数据.

hive> alter table factory add partition(region='eastregion') location '/user/hive/testing/testing1/factory3';
Run Code Online (Sandbox Code Playgroud)

我能知道为什么MSCK REPAIR TABLE命令不起作用吗?

Hak*_*giz 11

为了MSCK工作,/partition_name=partition_value/应该使用命名约定.


Rus*_*huk 0

您必须将数据放入表位置目录中名为“region=eastregio”的目录中:

$ hadoop fs -mkdir 'hdfs://localhost.localdomain:8020/user/hive/warehouse/factory/region=eastregio'
$ hadoop fs -copyFromLocal '/home/cloudera/factory3.txt' 'hdfs://localhost.localdomain:8020/user/hive/warehouse/factory/region=eastregio'
Run Code Online (Sandbox Code Playgroud)