我喜欢在Avro模式中多次使用相同的记录类型。考虑此架构定义
{
“ type”:“记录”,
“ name”:“ OrderBook”,
“ namespace”:“ my.types”,
“ doc”:“测试订单更新”,
“字段”:[
{
“ name”:“出价”,
“类型”:{
“ type”:“ array”,
“项目”:{
“ type”:“记录”,
“ name”:“ OrderBookVolume”,
“ namespace”:“ my.types”,
“字段”:[
{
“ name”:“ price”,
“ type”:“ double”
},
{
“ name”:“ volume”,
“ type”:“ double”
}
]
}
}
},
{
“ name”:“询问”,
“类型”:{
“ type”:“ array”,
“项目”:{
“ type”:“记录”,
“ name”:“ OrderBookVolume”,
“ namespace”:“ my.types”,
“字段”:[
{
“ name”:“ price”,
“ type”:“ double”
},
{
“ name”:“ volume”,
“ type”:“ double”
}
]
}
}
}
]
}
这不是有效的Avro架构,Avro架构解析器失败并显示
org.apache.avro.SchemaParseException:无法重新定义:my.types.OrderBookVolume
我可以通过将OrderBookVolume移到两个不同的名称空间来使类型唯一来解决此问题:
{
“ type”:“记录”,
“ name”:“ OrderBook”,
“ namespace”:“ my.types”,
“ doc”:“测试订单更新”,
“字段”:[
{
“ name”:“出价”,
“类型”:{
“ type”:“ array”,
“项目”:{
“ type”:“记录”,
“ name”:“ OrderBookVolume”,
“ namespace”:“ my.types.bid”,
“字段”:[
{
“ name”:“ price”,
“ type”:“ double”
},
{
“ name”:“ volume”,
“ type”:“ double”
}
]
}
}
},
{
“ name”:“询问”,
“类型”:{
“ type”:“ array”,
“项目”:{
“ type”:“记录”,
“ name”:“ OrderBookVolume”,
“ namespace”:“ my.types.ask”,
“字段”:[
{
“ name”:“ price”,
“ type”:“ double”
},
{
“ name”:“ volume”,
“ type”:“ double”
}
]
}
}
}
]
}
这不是一个有效的解决方案,因为Avro代码生成将生成两个不同的类,如果我还希望将该类型也用于其他用途,而不仅用于deser和ser,这将非常烦人。
此问题与这里的问题有关: Avro Spark问题#73
通过在名称空间前面加上外部记录名称,从而增加了具有相同名称的嵌套记录的区别。他们的用例可能纯粹与存储有关,因此可能对他们有用,但对我们不起作用。
有人知道更好的解决方案吗?这是Avro的硬限制吗?
小智 5
它的文档记录不充分,但是Avro允许您使用完整的名称空间来引用以前定义的名称。在您的情况下,以下代码将导致仅生成一个类,并由每个数组引用。它还可以很好地干燥模式。
{
"type": "record",
"name": "OrderBook",
"namespace": "my.types",
"doc": "Test order update",
"fields": [
{
"name": "bids",
"type": {
"type": "array",
"items": {
"type": "record",
"name": "OrderBookVolume",
"namespace": "my.types.bid",
"fields": [
{
"name": "price",
"type": "double"
},
{
"name": "volume",
"type": "double"
}
]
}
}
},
{
"name": "asks",
"type": {
"type": "array",
"items": "my.types.bid.OrderBookVolume"
}
}
]
}
Run Code Online (Sandbox Code Playgroud)
A schema or protocol may not contain multiple definitions of a fullname.
Further, a name must be defined before it is used ("before" in the
depth-first, left-to-right traversal of the JSON parse tree, where the
types attribute of a protocol is always deemed to come "before" the
messages attribute.)
Run Code Online (Sandbox Code Playgroud)
例如:
{
"type": "record",
"namespace": "my.types",
"name": "OrderBook",
"fields": [
{
"name": "bids",
"type": {
"type": "array",
"items": {
"type": "record",
"name": "OrderBookVolume",
"fields": [
{"name": "price", "type": "double"},
{"name": "volume", "type": "double"}
]
}
}
},
{
"name": "asks",
"type": {
"type": "array",
"items": {
"type": "record",
"name": "my.types.OrderBookVolume"
}
}
}
]
}
Run Code Online (Sandbox Code Playgroud)
第一次出现的是 的完整架构OrderBookVolume。之后,您可以参考fullname: my.types.OrderBookVolume。
还值得注意的是,您不需要为每条记录都有一个命名空间。它从其父级继承它。包括它将覆盖命名空间。
| 归档时间: |
|
| 查看次数: |
3679 次 |
| 最近记录: |