duz*_*vik 5 storage inventory amazon-s3 amazon-web-services
我试图了解如何使用 s3 库存。我正在关注本教程
将库存清单加载到我的表中后,我试图查询它并发现两个问题。
1)SELECT key, size FROM table;
所有记录的大小列显示一个幻数(值)4923069104295859283
2)select * from table;
查询编号:cf07c309-c685-4bf4-9705-8bca69b00b3c。
接收错误:
HIVE_BAD_DATA: Field size's type LONG in ORC is incompatible with type varchar defined in table schema
Run Code Online (Sandbox Code Playgroud)
这是我的表架构:
CREATE EXTERNAL TABLE `table`(
`bucket` string,
`key` string,
`version_id` string,
`is_latest` boolean,
`is_delete_marker` boolean,
`size` bigint,
`last_modified_date` timestamp,
`e_tag` string,
`storage_class` string)
PARTITIONED BY (
`dt` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
's3://......../hive'
TBLPROPERTIES (
'transient_lastDdlTime'='1516093603')
Run Code Online (Sandbox Code Playgroud)
来自 AWS S3 生成的清单的任何 orc 文件的以下命令将为您提供清单的实际结构:
$> hive --orcfiledump ~/Downloads/017c2014-1205-4431-a30d-2d9ae15492d6.orc
...
Processing data file /tmp/017017c2014-1205-4431-a30d-2d9ae15492d6.orc [length: 4741786]
Structure for /mp/017c2014-1205-4431-a30d-2d9ae15492d6.orc
File Version: 0.12 with ORC_135
Rows: 223473
Compression: ZLIB
Compression size: 262144
Type: struct<bucket:string,key:string,size:bigint,last_modified_date:timestamp,e_tag:string,storage_class:string,is_multipart_uploaded:boolean,replication_status:string,encryption_status:string>
...
Run Code Online (Sandbox Code Playgroud)
看来 aws此处提供的示例期望您的库存不仅适用于存储桶中的对象current version,还适用all versions于对象。
正确的表结构Athena是加密存储桶:
CREATE EXTERNAL TABLE inventory(
bucket string,
key string,
version_id string,
is_latest boolean,
is_delete_marker boolean,
size bigint,
last_modified_date timestamp,
e_tag string,
storage_class string,
is_multipart_uploaded boolean,
replication_status string,
encryption_status string
)
PARTITIONED BY (dt string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
LOCATION 's3://............/hive'
TBLPROPERTIES ('has_encrypted_data'='true');
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
8697 次 |
| 最近记录: |