Pie*_*rre 16 schema json generator avro
是否有任何工具能够从"典型的"JSON文档创建AVRO模式.
例如:
{
"records":[{"name":"X1","age":2},{"name":"X2","age":4}]
}
Run Code Online (Sandbox Code Playgroud)
我找到了http://jsonschema.net/reboot/#/,它生成了一个' json-schema '
{
"$schema": "http://json-schema.org/draft-04/schema#",
"id": "http://jsonschema.net#",
"type": "object",
"required": false,
"properties": {
"records": {
"id": "#records",
"type": "array",
"required": false,
"items": {
"id": "#1",
"type": "object",
"required": false,
"properties": {
"name": {
"id": "#name",
"type": "string",
"required": false
},
"age": {
"id": "#age",
"type": "integer",
"required": false
}
}
}
}
}
}
Run Code Online (Sandbox Code Playgroud)
但我想要一个AVRO版本.
您可以使用Apache Spark和python轻松实现这一目标。首先从http://spark.apache.org/downloads.html下载spark发行版,然后avro使用来安装python软件包pip。然后使用avro软件包运行pyspark:
./bin/pyspark --packages com.databricks:spark-avro_2.11:3.1.0
Run Code Online (Sandbox Code Playgroud)
并使用以下代码(假设input.json文件包含一个或多个json文档,每个文档位于单独的行中):
import os, avro.datafile
spark.read.json('input.json').coalesce(1).write.format("com.databricks.spark.avro").save("output.avro")
avrofile = filter(lambda file: file.startswith('part-r-00000'), os.listdir('output.avro'))[0]
with open('output.avro/' + avrofile) as avrofile:
reader = avro.datafile.DataFileReader(avrofile, avro.io.DatumReader())
print(reader.datum_reader.writers_schema)
Run Code Online (Sandbox Code Playgroud)
例如:对于具有内容的输入文件:
{'string': 'somestring', 'number': 3.14, 'structure': {'integer': 13}}
{'string': 'somestring2', 'structure': {'integer': 14}}
Run Code Online (Sandbox Code Playgroud)
该脚本将导致:
{"fields": [{"type": ["double", "null"], "name": "number"}, {"type": ["string", "null"], "name": "string"}, {"type": [{"type": "record", "namespace": "", "name": "structure", "fields": [{"type": ["long", "null"], "name": "integer"}]}, "null"], "name": "structure"}], "type": "record", "name": "topLevelRecord"}
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
8537 次 |
| 最近记录: |