Pav*_*thy 1 hadoop hive hiveql
我有一个.xlsx文件,其中包含一些像下面的图像,我试图使用下面的创建查询创建
CREATE TABLE aus_aboriginal(
code int,
area_name string,
male_0_4 STRUCT<num:double, total:double, perc:double>,
male_5_9 STRUCT<num:double, total:double, perc:double>,
male_10_14 STRUCT<num:double, total:double, perc:double>,
male_15_19 STRUCT<num:double, total:double, perc:double>,
male_20_24 STRUCT<num:double, total:double, perc:double>,
male_25_29 STRUCT<num:double, total:double, perc:double>,
male_30_34 STRUCT<num:double, total:double, perc:double>,
male_35_39 STRUCT<num:double, total:double, perc:double>,
male_40_44 STRUCT<num:double, total:double, perc:double>,
male_45_49 STRUCT<num:double, total:double, perc:double>,
male_50_54 STRUCT<num:double, total:double, perc:double>,
male_55_59 STRUCT<num:double, total:double, perc:double>,
male_60_64 STRUCT<num:double, total:double, perc:double>,
male_above_65 STRUCT<num:double, total:double, perc:double>)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t';
Run Code Online (Sandbox Code Playgroud)
当我将数据加载到其中时,我得到了 nulls
我错过了CREATE TABLE..什么?
小智 7
在使用像struct这样的复杂类型时,建议使用唯一的分隔符来进行收集,而不是使用字段(列).考虑以下格式的csv文件,其中使用","逗号分隔符. Input.csv
Code,area_name,num,total,perc,num,total,perc,num,total,perc 1100,Albury,90,444,17.4,73,546,13.4,86,546,15.8
1111,armid,40,404,14.4,97,701,13.8,76,701,10.8
预期的结果是从字段(num,total和perc)创建复杂类型:
1100,奥尔伯里,结构<90,444,17.4>,结构<73,546,13.4>,结构<86,546,15.8>
1111,armid,struct <40,404,14.4>,struct <97,701,13.8>,struct <76,701,10.8>
当我们尝试使用以下hive查询创建字段(num,total和perc)中的复杂类型时,我们将在表中获得多个空值,因为相同的","逗号分隔符用于字段和集合,因此Hive查询无法按我们的要求隔离数据.
Hive> create table aus_aboriginal( code int, area_name string, male_0_4 STRUCT<num:double, total:double, perc:double>, male_5_9 STRUCT<num:double, total:double, perc:double>, male_10_14 STRUCT<num:double, total:double, perc:double>) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' COLLECTION ITEMS TERMINATED BY ',' LOCATION '/csv';
Run Code Online (Sandbox Code Playgroud)
输出:
1100 Albury {"num":90.0,"total":null,"perc":null} {"num":444.0,"total":nul l,"perc":null} {"num":17.4,"total ":空," PERC":空}
1111 armid {"num":40.0,"total":null,"perc":null} {"num":404.0,"total":nul l,"perc":null} {"num":14.4,"total ":空," PERC":空}
所用时间:0.15秒,提取时间:2排
我怀疑你正面临这个问题.
Struct的用法 现在考虑输入文件具有以下格式的数据,其中","逗号分隔符用于字段,而集合项"#"用作分隔符.
1100,奥尔伯里,90#444#17.4,73#546#13.4,86#546#15.8
1111,armid,40#404#14.4,97#701#13.8,76#701#10.8
在这种情况下,我们可以通过为集合项和字段指定#as delimiter来成功创建具有复杂类型的表.请查看下面的hive查询.
hive> create table aus_aboriginal( code int, area_name string, male_0_4 STRUCT<num:double, total:double, perc:double>, male_5_9 STRUCT<num:double, total:double, perc:double>, male_10_14 STRUCT<num:double, total:double, perc:double>) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' COLLECTION ITEMS TERMINATED BY '#' LOCATION '/csv';
Run Code Online (Sandbox Code Playgroud)
输出:
hive> select * from aus_aboriginal;
Run Code Online (Sandbox Code Playgroud)
1100 Albury {"num":90.0,"total":444.0,"perc":17.4} {"num":73.0,"total":546.0,"perc":13.4} {"num":86.0,"total":546.0,"perc":15.8}
1111 armid {"num":40.0,"total":404.0,"perc":14.4} {"num":97.0,"total":701.0,"perc":13.8} {"num":76.0,"total":701.0,"perc":10.8}
所用时间:0.146秒,提取时间:2排
对于其他复杂类型也应采取类似的方法,请参阅下面的链接以获取更多信息.
参考: http ://edu-kinect.com/blog/2014/06/16/hive-complex-data-types-with-examples/
| 归档时间: |
|
| 查看次数: |
12742 次 |
| 最近记录: |