如何修复预期的启动联盟.在命令行上将JSON转换为Avro时获得了VALUE_NUMBER_INT?

Emr*_*inç 18 validation json avro

我正在尝试使用Avro架构验证JSON文件并编写相应的Avro文件.首先,我定义了以下名为的Avro架构user.avsc:

{"namespace": "example.avro",
 "type": "record",
 "name": "user",
 "fields": [
     {"name": "name", "type": "string"},
     {"name": "favorite_number",  "type": ["int", "null"]},
     {"name": "favorite_color", "type": ["string", "null"]}
 ]
}
Run Code Online (Sandbox Code Playgroud)

然后创建了一个user.json文件:

{"name": "Alyssa", "favorite_number": 256, "favorite_color": null}
Run Code Online (Sandbox Code Playgroud)

然后尝试运行:

java -jar ~/bin/avro-tools-1.7.7.jar fromjson --schema-file user.avsc user.json > user.avro
Run Code Online (Sandbox Code Playgroud)

但我得到以下异常:

Exception in thread "main" org.apache.avro.AvroTypeException: Expected start-union. Got VALUE_NUMBER_INT
    at org.apache.avro.io.JsonDecoder.error(JsonDecoder.java:697)
    at org.apache.avro.io.JsonDecoder.readIndex(JsonDecoder.java:441)
    at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:290)
    at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
    at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:267)
    at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:155)
    at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193)
    at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183)
    at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151)
    at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
    at org.apache.avro.tool.DataFileWriteTool.run(DataFileWriteTool.java:99)
    at org.apache.avro.tool.Main.run(Main.java:84)
    at org.apache.avro.tool.Main.main(Main.java:73)
Run Code Online (Sandbox Code Playgroud)

我错过了什么吗?为什么我会得到"预期的开始 - 工会.获得VALUE_NUMBER_INT".

Emr*_*inç 32

根据Doug Cutting的解释,

Avro的JSON编码要求使用其预期类型标记非空联合值.这是因为像["bytes","string"]和["int","long"]这样的联合在JSON中是不明确的,第一个都被编码为JSON字符串,而第二个都被编码为JSON数字.

http://avro.apache.org/docs/current/spec.html#json_encoding

因此,您的记录必须编码为:

{"name": "Alyssa", "favorite_number": {"int": 7}, "favorite_color": null}
Run Code Online (Sandbox Code Playgroud)

  • 感谢您的见解.期待@ppearcy提到的AVRO-1582 (2认同)

ppe*_*rcy 10

工作中有一个新的JSON编码器可以解决这个常见问题:

https://issues.apache.org/jira/browse/AVRO-1582

https://github.com/zolyfarkas/avro


Abh*_*bey 5

正如 @Emre-Sevinc 所指出的,问题出在 Avro 记录的编码上。

在这里更具体地说;

不要这样做:

   jsonRecord = avroGenericRecord.toString
Run Code Online (Sandbox Code Playgroud)

相反,请执行以下操作:

    val writer = new GenericDatumWriter[GenericRecord](avroSchema)
    val baos = new ByteArrayOutputStream
    val jsonEncoder = EncoderFactory.get.jsonEncoder(avroSchema, baos)
    writer.write(avroGenericRecord, jsonEncoder)
    jsonEncoder.flush

    val jsonRecord = baos.toString("UTF-8")
Run Code Online (Sandbox Code Playgroud)

您还需要以下导入:

import org.apache.avro.Schema
import org.apache.avro.generic.{GenericData, GenericDatumReader, GenericDatumWriter, GenericRecord}
import org.apache.avro.io.{DecoderFactory, EncoderFactory}
Run Code Online (Sandbox Code Playgroud)

执行此操作后,您将获得jsonRecord标有其预期类型的​​非空联合值。

希望这可以帮助 !