Bar*_*Alp 5 hadoop hive amazon-emr elastic-map-reduce amazon-dynamodb
我正在尝试将带有"导入/导出"工具的Amazon Dynamodb Web控制台中的DynamoDb导出文件加载到Hive中.但我无法正确映射字段,因为DynamoDB Web控制台"导出"工具正在使用"ETX""STX".
下面是以[LF]结尾的示例行
Elapsed[ETX]{"n":"1477"}[STX]Device[ETX]{"n":"3"}[STX]Date[ETX]{"s":"2014-03-05T12:13:00.852Z"}[STX]Duration[ETX]{"n":"8075"}[LF]
Run Code Online (Sandbox Code Playgroud)
这个问题应该是什么?
CREATE EXTERNAL TABLE IF NOT EXISTS TableNameHere (creationDate string, device bigint, duration bigint, elapsed bigint)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' ???This is where i got stuck???
LOCATION 's3://abcdefg/ino/2015-05-28_12.22';
Run Code Online (Sandbox Code Playgroud)
UPDATE
我已经更新了查询,但它没有再次运行.
对于LF ,STX '\ 012'为'\ 002'
CREATE EXTERNAL TABLE IF NOT EXISTS TableNameHere (creationDate string, device bigint, duration bigint, elapsed bigint)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\002'
LINES TERMINATED BY '\012'
LOCATION 's3://abcdefg/ino/2015-05-28_12.22';
Run Code Online (Sandbox Code Playgroud)
查询结果:
Elapsed{"n":"0"} Device{"n":"3"} Duration{"n":"1073876"} Date{"s":"2014-01-27T00:52:25.491Z"}
Run Code Online (Sandbox Code Playgroud)
那么,现在我该如何解析这些数据呢?我需要映射字段.我应该使用自定义SerDe吗?
| 归档时间: |
|
| 查看次数: |
687 次 |
| 最近记录: |