我有两个类似的模式,其中只有一个嵌套字段发生更改(onefield在schema1和anotherfieldschema2中调用).
schema1
{
"type": "record",
"name": "event",
"namespace": "foo",
"fields": [
{
"name": "metadata",
"type": {
"type": "record",
"name": "event",
"namespace": "foo.metadata",
"fields": [
{
"name": "onefield",
"type": [
"null",
"string"
],
"default": null
}
]
},
"default": null
}
]
}
Run Code Online (Sandbox Code Playgroud)
SCHEMA2
{
"type": "record",
"name": "event",
"namespace": "foo",
"fields": [
{
"name": "metadata",
"type": {
"type": "record",
"name": "event",
"namespace": "foo.metadata",
"fields": [
{
"name": "anotherfield",
"type": [
"null",
"string"
],
"default": null
}
]
},
"default": null
}
]
}
Run Code Online (Sandbox Code Playgroud)
我能够使用avro 1.8.0以编程方式合并两个模式:
Schema s1 = new Schema.Parser().parse(schema1);
Schema s2 = new Schema.Parser().parse(schema2);
Schema[] schemas = {s1, s2};
Schema mergedSchema = null;
for (Schema schema: schemas) {
mergedSchema = AvroStorageUtils.mergeSchema(mergedSchema, schema);
}
Run Code Online (Sandbox Code Playgroud)
并使用它将输入json转换为avro或json表示:
JsonAvroConverter converter = new JsonAvroConverter();
try {
byte[] example = new String("{}").getBytes("UTF-8");
byte[] avro = converter.convertToAvro(example, mergedSchema);
byte[] json = converter.convertToJson(avro, mergedSchema);
System.out.println(new String(json));
} catch (AvroConversionException e) {
e.printStackTrace();
}
Run Code Online (Sandbox Code Playgroud)
该代码显示了预期的输出:{"metadata":{"onefield":null,"anotherfield":null}}.问题是我无法看到合并的架构.如果我做一个简单的System.out.println(mergedSchema)我得到以下异常:
Exception in thread "main" org.apache.avro.SchemaParseException: Can't redefine: merged schema (generated by AvroStorage).merged
at org.apache.avro.Schema$Names.put(Schema.java:1127)
at org.apache.avro.Schema$NamedSchema.writeNameRef(Schema.java:561)
at org.apache.avro.Schema$RecordSchema.toJson(Schema.java:689)
at org.apache.avro.Schema$RecordSchema.fieldsToJson(Schema.java:715)
at org.apache.avro.Schema$RecordSchema.toJson(Schema.java:700)
at org.apache.avro.Schema.toString(Schema.java:323)
at org.apache.avro.Schema.toString(Schema.java:313)
at java.lang.String.valueOf(String.java:2982)
at java.lang.StringBuilder.append(StringBuilder.java:131)
Run Code Online (Sandbox Code Playgroud)
我称之为avro不确定性原则:).看起来avro能够使用合并的模式,但在尝试将模式序列化为JSON时失败.合并使用更简单的模式,所以它听起来像avro 1.8.0中的一个错误.
你知道会发生什么或如何解决它吗?任何解决方法(例如:替代Schema序列化程序)都是受欢迎的.
我发现 Pig util 类也有同样的问题...实际上这里有 2 个错误
Schema mergedSchema = SchemaUtil.merge(s1, s2);
Run Code Online (Sandbox Code Playgroud)
从你的例子中,我得到以下输出
{
"type": "record",
"name": "event",
"namespace": "foo",
"fields": [
{
"name": "metadata",
"type": {
"type": "record",
"name": "event",
"namespace": "foo.metadata",
"fields": [
{
"name": "onefield",
"type": [
"null",
"string"
],
"default": null
},
{
"name": "anotherfield",
"type": [
"null",
"string"
],
"default": null
}
]
},
"default": null
}
]
}
Run Code Online (Sandbox Code Playgroud)
希望这会帮助其他人。