将JSON导入ClickHouse

Ver*_*_us 7 database clickhouse

我用这个语句创建表:

CREATE TABLE event(
    date Date,
    src UInt8,
    channel UInt8,
    deviceTypeId UInt8,
    projectId UInt64,
    shows UInt32,
    clicks UInt32,
    spent Float64
) ENGINE = MergeTree(date, (date, src, channel, projectId), 8192);
Run Code Online (Sandbox Code Playgroud)

原始数据看起来像:

{ "date":"2016-03-07T10:00:00+0300","src":2,"channel":18,"deviceTypeId ":101, "projectId":2363610,"shows":1232,"clicks":7,"spent":34.72,"location":"Unknown", ...}
...
Run Code Online (Sandbox Code Playgroud)

包含以下命令的数据的文件:

cat *.data|sed 's/T[0-9][0-9]:[0-9][0-9]:[0-9][0-9]+0300//'| clickhouse-client --query="INSERT INTO event FORMAT JSONEachRow"
Run Code Online (Sandbox Code Playgroud)

clickhouse-client抛出异常:

Code: 117. DB::Exception: Unknown field found while parsing JSONEachRow format: location: (at row 1)
Run Code Online (Sandbox Code Playgroud)

是否可以跳过表格描述中未提供的JSON对象中的字段?

小智 13

最新的ClickHouse版本(v1.1.54023)支持input_format_skip_unknown_fields用户选项,可以跳过JSONEachRow和TSKV格式的未知字段.

尝试

clickhouse-client -n --query="SET input_format_skip_unknown_fields=1; INSERT INTO event FORMAT JSONEachRow;"
Run Code Online (Sandbox Code Playgroud)

请参阅文档中的更多细节.

  • 在clickhouse-client中指定设置有更方便的方法:clickhouse-client --input_format_skip_unknown_fields = 1 --query ="INSERT INTO event FORMAT JSONEachRow;" (3认同)