bri*_*isk 13 json hadoop hive amazon-s3
我已将蜂巢版本从0.20更新为0.13.1.
我正在使用下表和查询从S3中提取json.
表:
    > CREATE EXTERNAL TABLE in_app_logs (
    > event string,
    > app_id string,
    > idfa string,
    > idfv string
    > )ROW FORMAT DELIMITED
    > FIELDS TERMINATED BY '\t'
    > LOCATION 's3://test/in_app_logs/ds=2015-04-20/'; 
我的查询看起来像版本0.20的下面,它与旧版本一起工作正常.
    SELECT
       get_json_object(in_app_logs.event, '$.ev') as event_type,
       get_json_object(in_app_logs.event, '$.global.app_id') as app_id,
       get_json_object(in_app_logs.event, '$.global.ios.idfa') as idfa,
       get_json_object(in_app_logs.event, '$.global.ios.idfv') as idfv
    FROM in_app_logs;
在新版本中,它已更改为json_tuple.我在更新版本中尝试过此查询.得到了错误.
SELECT b.event_type, c.app_id, d.idfa, d.idfv
FROM in_app_logs a
LATERAL VIEW json_tuple(a.event, 'ev') b as event_type,
LATERAL VIEW json_tuple(a.event.global, 'app_id') c as app_id,
LATERAL VIEW json_tuple(a.event.global.ios, 'idfa', 'idfv') d as idfa, idfv
S3日志:
   {
      "installed_at": "2015-04-17T12:10:24Z",
      "ev": "event_install",
      "global": {
        "ios": {
          "idfv": "887DF776-C1FC-4567-DESF-741AC72197D1",
          "time_zone": "EDT",
          "model": "iPhone7,2",
          "screen_size": "320x568",
          "carrier": "AT&T",
          "language": "en",
          "idfa": "CD04291C-0D80-4377-6CS9-B46089A05F15",
          "os_version": "8.2.0",
          "country": "US"
        }
任何人都可以帮我提取json数据吗?
这 '。' 仅结构或结构列表支持运算符。您正在尝试将其应用于 STRING 类型。
你可能需要这样的东西:
SELECT x.event_type, x.app_id, x.idfa, x.idfv
FROM in_app_logs a
LATERAL VIEW JSON_TUPLE(
    a.event,
   'ev',
   'global.app_id',
   'global.ios.idfa',
   'global.ios.idfv'
) x AS event_type, app_id, idfa, idfv
| 归档时间: | 
 | 
| 查看次数: | 667 次 | 
| 最近记录: |