Emr*_*inç 18 validation json avro
我正在尝试使用Avro架构验证JSON文件并编写相应的Avro文件.首先,我定义了以下名为的Avro架构user.avsc:
{"namespace": "example.avro",
"type": "record",
"name": "user",
"fields": [
{"name": "name", "type": "string"},
{"name": "favorite_number", "type": ["int", "null"]},
{"name": "favorite_color", "type": ["string", "null"]}
]
}
Run Code Online (Sandbox Code Playgroud)
然后创建了一个user.json文件:
{"name": "Alyssa", "favorite_number": 256, "favorite_color": null}
Run Code Online (Sandbox Code Playgroud)
然后尝试运行:
java -jar ~/bin/avro-tools-1.7.7.jar fromjson --schema-file user.avsc user.json > user.avro
Run Code Online (Sandbox Code Playgroud)
但我得到以下异常:
Exception in thread "main" org.apache.avro.AvroTypeException: Expected start-union. Got VALUE_NUMBER_INT
at org.apache.avro.io.JsonDecoder.error(JsonDecoder.java:697)
at org.apache.avro.io.JsonDecoder.readIndex(JsonDecoder.java:441)
at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:290)
at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:267)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:155)
at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193)
at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
at org.apache.avro.tool.DataFileWriteTool.run(DataFileWriteTool.java:99)
at org.apache.avro.tool.Main.run(Main.java:84)
at org.apache.avro.tool.Main.main(Main.java:73)
Run Code Online (Sandbox Code Playgroud)
我错过了什么吗?为什么我会得到"预期的开始 - 工会.获得VALUE_NUMBER_INT".
Emr*_*inç 32
Avro的JSON编码要求使用其预期类型标记非空联合值.这是因为像["bytes","string"]和["int","long"]这样的联合在JSON中是不明确的,第一个都被编码为JSON字符串,而第二个都被编码为JSON数字.
http://avro.apache.org/docs/current/spec.html#json_encoding
因此,您的记录必须编码为:
{"name": "Alyssa", "favorite_number": {"int": 7}, "favorite_color": null}
Run Code Online (Sandbox Code Playgroud)
ppe*_*rcy 10
工作中有一个新的JSON编码器可以解决这个常见问题:
https://issues.apache.org/jira/browse/AVRO-1582
https://github.com/zolyfarkas/avro
正如 @Emre-Sevinc 所指出的,问题出在 Avro 记录的编码上。
在这里更具体地说;
不要这样做:
jsonRecord = avroGenericRecord.toString
Run Code Online (Sandbox Code Playgroud)
相反,请执行以下操作:
val writer = new GenericDatumWriter[GenericRecord](avroSchema)
val baos = new ByteArrayOutputStream
val jsonEncoder = EncoderFactory.get.jsonEncoder(avroSchema, baos)
writer.write(avroGenericRecord, jsonEncoder)
jsonEncoder.flush
val jsonRecord = baos.toString("UTF-8")
Run Code Online (Sandbox Code Playgroud)
您还需要以下导入:
import org.apache.avro.Schema
import org.apache.avro.generic.{GenericData, GenericDatumReader, GenericDatumWriter, GenericRecord}
import org.apache.avro.io.{DecoderFactory, EncoderFactory}
Run Code Online (Sandbox Code Playgroud)
执行此操作后,您将获得jsonRecord标有其预期类型的非空联合值。
希望这可以帮助 !