Apache Drill中无嵌套的嵌套JSON结构

Ian*_*Ian 4 json apache-drill

我有以下JSON(粗略),我想分别从headerdefects字段中提取信息:

{
  "file": {
    "header": {
      "timeStamp": "2016-03-14T00:20:15.005+04:00",
      "serialNo": "3456",
      "sensorId": "1234567890",
    },
    "defects": [
      {
        "info": {
          "systemId": "DEFCHK123",
          "numDefects": "3",
          "defectParts": [
            "003", "006", "008"
          ]
        }
      }
    ]
  }
}
Run Code Online (Sandbox Code Playgroud)

我曾尝试使用file.header.timeStampetc 访问各个元素但返回null.我尝试过使用,flatten(file)但这给了我

无法将org.apache.drill.exec.vector.complex.MapVector转换为org.apache.drill.exec.vector.complex.RepeatedValueVector

我已经调查了kvgen()但是看不出这对我的情况如何.我试过kvgen(file.header)但是这让我受益匪浅

kvgen函数仅支持简单映射作为输入

这无论如何都是我所期待的.

有谁知道我能得到header并且defects,这样我就可以处理其中包含的信息.理想情况下,我只是选择信息,header因为它不包含数组或地图,因此我可以按原样获取单个记录.因为defects我只是FLATTEN(defectParts)用来获得有缺陷部件的表格.

任何帮助,将不胜感激.

小智 6

您使用的是什么版本的Drill?我尝试在最新的master(1.7.0-SNAPHOT)上查询以下文件:

{
  "file": {
    "header": {
      "timeStamp": "2016-03-14T00:20:15.005+04:00",
      "serialNo": "3456",
      "sensorId": "1234567890"
    },
    "defects": [
      {
        "info": {
          "systemId": "DEFCHK123",
          "numDefects": "3",
          "defectParts": [
            "003", "006", "008"
          ]
        }
      }
    ]
  }
}
{
  "file": {
    "header": {
      "timeStamp": "2016-03-14T00:20:15.005+04:00",
      "serialNo": "3456",
      "sensorId": "1234567890"
    },
    "defects": [
      {
        "info": {
          "systemId": "DEFCHK123",
          "numDefects": "3",
          "defectParts": [
            "003", "006", "008"
          ]
        }
      }
    ]
  }
}
Run Code Online (Sandbox Code Playgroud)

以下查询工作正常:1.

select t.file.header.serialno as serialno from `parts.json` t;
+-----------+
| serialno  |
+-----------+
| 3456      |
| 3456      |
+-----------+
2 rows selected (0.098 seconds)
Run Code Online (Sandbox Code Playgroud)

2.

select flatten(t.file.defects) defects from `parts.json` t;
+---------------------------------------------------------------------------------------+
|                                        defects                                        |
+---------------------------------------------------------------------------------------+
| {"info":{"systemId":"DEFCHK123","numDefects":"3","defectParts":["003","006","008"]}}  |
| {"info":{"systemId":"DEFCHK123","numDefects":"3","defectParts":["003","006","008"]}}  |
+---------------------------------------------------------------------------------------+
Run Code Online (Sandbox Code Playgroud)

3.

select q.h.serialno as serialno, q.d.info.defectParts as defectParts from (select t.file.header h, flatten(t.file.defects) d from `parts.json` t) q;
+-----------+----------------------+
| serialno  |     defectParts      |
+-----------+----------------------+
| 3456      | ["003","006","008"]  |
| 3456      | ["003","006","008"]  |
+-----------+----------------------+
2 rows selected (0.126 seconds)
Run Code Online (Sandbox Code Playgroud)

PS:这应该是评论,但我还没有足够的代表!