Gre*_*een 6 hadoop hive mapreduce apache-hive
我是Apache Hive的新手.在处理外部表分区时,如果我将新分区直接添加到HDFS,则在运行MSCK REPAIR表后不会添加新分区.以下是我试过的代码,
- 创建外部表
hive> create external table factory(name string, empid int, age int) partitioned by(region string)
> row format delimited fields terminated by ',';
Run Code Online (Sandbox Code Playgroud)
- 详细表信息
Location: hdfs://localhost.localdomain:8020/user/hive/warehouse/factory
Table Type: EXTERNAL_TABLE
Table Parameters:
EXTERNAL TRUE
transient_lastDdlTime 1438579844
Run Code Online (Sandbox Code Playgroud)
- 在HDFS中创建目录以加载表工厂的数据
[cloudera@localhost ~]$ hadoop fs -mkdir 'hdfs://localhost.localdomain:8020/user/hive/testing/testing1/factory1'
[cloudera@localhost ~]$ hadoop fs -mkdir 'hdfs://localhost.localdomain:8020/user/hive/testing/testing1/factory2'
Run Code Online (Sandbox Code Playgroud)
- 表数据
cat factory1.txt
emp1,500,40
emp2,501,45
emp3,502,50
cat factory2.txt
EMP10,200,25
EMP11,201,27
EMP12,202,30
Run Code Online (Sandbox Code Playgroud)
- 从本地复制到HDFS
[cloudera@localhost ~]$ hadoop fs -copyFromLocal '/home/cloudera/factory1.txt' 'hdfs://localhost.localdomain:8020/user/hive/testing/testing1/factory1'
[cloudera@localhost ~]$ hadoop fs -copyFromLocal '/home/cloudera/factory2.txt' 'hdfs://localhost.localdomain:8020/user/hive/testing/testing1/factory2'
Run Code Online (Sandbox Code Playgroud)
- 改变表以在Metastore中更新
hive> alter table factory add partition(region='southregion') location '/user/hive/testing/testing1/factory2';
hive> alter table factory add partition(region='northregion') location '/user/hive/testing/testing1/factory1';
hive> select * from factory;
OK
emp1 500 40 northregion
emp2 501 45 northregion
emp3 502 50 northregion
EMP10 200 25 southregion
EMP11 201 27 southregion
EMP12 202 30 southregion
Run Code Online (Sandbox Code Playgroud)
现在我创建了新的文件factory3.txt来添加为表工厂的新分区
cat factory3.txt
user1,100,25
user2,101,27
user3,102,30
Run Code Online (Sandbox Code Playgroud)
- 创建路径并复制表数据
[cloudera@localhost ~]$ hadoop fs -mkdir 'hdfs://localhost.localdomain:8020/user/hive/testing/testing1/factory2'
[cloudera@localhost ~]$ hadoop fs -copyFromLocal '/home/cloudera/factory3.txt' 'hdfs://localhost.localdomain:8020/user/hive/testing/testing1/factory3'
Run Code Online (Sandbox Code Playgroud)
现在我执行了以下查询来更新添加的新分区的Metastore
MSCK REPAIR TABLE factory;
Run Code Online (Sandbox Code Playgroud)
现在该表没有给出factory3文件的新分区内容.在为工厂工厂添加分区时,我可以知道我在哪里做错吗?
然而,如果我运行alter命令,那么它将显示新的分区数据.
hive> alter table factory add partition(region='eastregion') location '/user/hive/testing/testing1/factory3';
Run Code Online (Sandbox Code Playgroud)
我能知道为什么MSCK REPAIR TABLE命令不起作用吗?
您必须将数据放入表位置目录中名为“region=eastregio”的目录中:
$ hadoop fs -mkdir 'hdfs://localhost.localdomain:8020/user/hive/warehouse/factory/region=eastregio'
$ hadoop fs -copyFromLocal '/home/cloudera/factory3.txt' 'hdfs://localhost.localdomain:8020/user/hive/warehouse/factory/region=eastregio'
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
29105 次 |
| 最近记录: |