Ily*_*nko 3 json netflow clickhouse
我在 ClickHouse 表中有一些原始 JSON 数据(实际上是来自 netflow 收集器的 netflow V9),它看起来像这样:
{"AgentID":"10.1.8.1",
"Header":{"Version":9,"Count":2},
"DataSets":[
[{"I":2,"V":"231"},{"I":3,"V":"151"},{"I":8,"V":"109.195.122.130"}],
[{"I":2,"V":"341"},{"I":3,"V":"221"},{"I":8,"V":"109.195.122.233"}]
]}'
Run Code Online (Sandbox Code Playgroud)
我的任务是通过以下方式将 DataSets 数组转换为另一个 ClickHouse 表:
I2 I3 I8
-----------------------------
231 151 109.195.122.130
341 221 109.195.122.233
...
Run Code Online (Sandbox Code Playgroud)
要解析 JSON,请考虑使用专门的json 函数:
\nSELECT\n toInt32(column_values[1]) AS I2,\n toInt32(column_values[2]) AS I3,\n column_values[3] AS I8\nFROM \n(\n SELECT\n arrayJoin(JSONExtract(json, \'DataSets\', \'Array(Array(Tuple(Int32, String)))\')) AS row,\n arraySort(x -> (x.1), row) AS row_with_sorted_columns,\n arrayMap(x -> (x.2), row_with_sorted_columns) AS column_values\n FROM \n (\n SELECT \'{"AgentID":"10.1.8.1", "Header":{"Version":9,"Count":2}, "DataSets":[\\n [{"I":3,"V":"151"},{"I":8,"V":"109.195.122.130"},{"I":2,"V":"231"}],\\n [{"I":2,"V":"341"},{"I":3,"V":"221"},{"I":8,"V":"109.195.122.233"}]]}\' AS json\n )\n)\n\n\n/*\n\xe2\x94\x8c\xe2\x94\x80I2\xe2\x94\x80\xe2\x94\x80\xe2\x94\xac\xe2\x94\x80I3\xe2\x94\x80\xe2\x94\x80\xe2\x94\xac\xe2\x94\x80I8\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x90\n\xe2\x94\x82 231 \xe2\x94\x82 151 \xe2\x94\x82 109.195.122.130 \xe2\x94\x82\n\xe2\x94\x82 341 \xe2\x94\x82 221 \xe2\x94\x82 109.195.122.233 \xe2\x94\x82\n\xe2\x94\x94\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\xb4\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\xb4\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x98\n*/\nRun Code Online (Sandbox Code Playgroud)\n(要了解有关 JSON 解析的更多信息,请参阅如何从 clickhouse 中的 json 中提取 json?)
\n上面的实现依赖于Datasets-array的固定结构。据我在现实世界中的理解,该结构具有任意模式(https://www.iana.org/assignments/ipfix/ipfix.xhtml),例如:
\n{\n "AgentID":"192.168.21.15",\n "Header":{},\n "DataSets":[\n [\n {"I":8, "V":"192.16.28.217"},\n {"I":12, "V":"180.10.210.240"},\n {"I":5, "V":2},\n {"I":4, "V":6},\n {"I":7, "V":443},\n {"I":6, "V":"0x10"}\n ]\n ]\n}\nRun Code Online (Sandbox Code Playgroud)\n因此出现了关于具有任意列数的表的问题。ClickHouse 不支持此功能 - 看看如何在这种情况下呈现表格https://stackoverflow.com/search?q=%5Bclickhouse%5D+pivot。
\n| 归档时间: |
|
| 查看次数: |
6054 次 |
| 最近记录: |