如何在ProtobufAnnotationSerializer中获取protobuf扩展字段

Question

如何在ProtobufAnnotationSerializer中获取protobuf扩展字段

tsc*_*hel 1 protocol-buffers stanford-nlp

我是协议缓冲区的新手，并尝试弄清楚如何在斯坦福 CoreNLP 库中扩展消息类型，如下所述： https: //nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/管道/ProtobufAnnotationSerializer.html

问题：我可以设置扩展字段，但无法获取它。我将问题归结为下面的代码。在原始消息中，字段名称被反序列化消息中的[edu.stanford.nlp.pipeline.myNewField]字段编号替换。101

我如何获取 myNewField 的值？

PS：这篇文章/sf/ask/2017065011/建议它应该像调用一样简单getExtension(MyAppProtos.myNewField)

定制原型

syntax = "proto2";

package edu.stanford.nlp.pipeline;

option java_package = "com.example.my.awesome.nlp.app";
option java_outer_classname = "MyAppProtos";

import "CoreNLP.proto";

extend Sentence {
    optional uint32 myNewField = 101;
}

Run Code Online (Sandbox Code Playgroud)

原型测试.java

import com.example.my.awesome.nlp.app.MyAppProtos;
import com.google.protobuf.ExtensionRegistry;
import com.google.protobuf.InvalidProtocolBufferException;

import edu.stanford.nlp.pipeline.CoreNLPProtos;
import edu.stanford.nlp.pipeline.CoreNLPProtos.Sentence;

public class ProtoTest {

    static {
        ExtensionRegistry registry = ExtensionRegistry.newInstance();
        registry.add(MyAppProtos.myNewField);
        CoreNLPProtos.registerAllExtensions(registry);
    }

    public static void main(String[] args) throws InvalidProtocolBufferException {

        Sentence originalSentence = Sentence.newBuilder()
                .setText("Hello world!")
                .setTokenOffsetBegin(0)
                .setTokenOffsetEnd(12)
                .setExtension(MyAppProtos.myNewField, 13)
                .build();

        System.out.println("Original:\n" + originalSentence);

        byte[] serialized = originalSentence.toByteArray();

        Sentence deserializedSentence = Sentence.parseFrom(serialized);
        System.out.println("Deserialized:\n" + deserializedSentence);

        Integer myNewField = deserializedSentence.getExtension(MyAppProtos.myNewField);
        System.out.println("MyNewField: " + myNewField);
    }
}

Run Code Online (Sandbox Code Playgroud)

输出：

Original:
tokenOffsetBegin: 0
tokenOffsetEnd: 12
text: "Hello world!"
[edu.stanford.nlp.pipeline.myNewField]: 13

Deserialized:
tokenOffsetBegin: 0
tokenOffsetEnd: 12
text: "Hello world!"
101: 13

MyNewField: 0

Run Code Online (Sandbox Code Playgroud)

更新因为这个问题是关于扩展 CoreNLP 消息类型并将它们与一起使用ProtobufAnnotationSerializer，所以我的扩展序列化器如下所示：

import java.io.IOException;
import java.io.InputStream;
import java.util.Set;

import com.example.my.awesome.nlp.app.MyAppProtos;
import com.google.protobuf.ExtensionRegistry;

import edu.stanford.nlp.pipeline.Annotation;
import edu.stanford.nlp.pipeline.CoreNLPProtos;
import edu.stanford.nlp.pipeline.CoreNLPProtos.Sentence;
import edu.stanford.nlp.pipeline.CoreNLPProtos.Sentence.Builder;
import edu.stanford.nlp.pipeline.ProtobufAnnotationSerializer;
import edu.stanford.nlp.util.CoreMap;
import edu.stanford.nlp.util.Pair;

public class MySerializer extends ProtobufAnnotationSerializer {

    private static ExtensionRegistry registry;

    static {
        registry = ExtensionRegistry.newInstance();
        registry.add(MyAppProtos.myNewField);
        CoreNLPProtos.registerAllExtensions(registry);
    }

    @Override
    protected Builder toProtoBuilder(CoreMap sentence, Set<Class<?>> keysToSerialize) {

        keysToSerialize.remove(MyAnnotation.class);
        Builder builder = super.toProtoBuilder(sentence, keysToSerialize);
        builder.setExtension(MyAppProtos.myNewField, 13);

        return builder;
    }

    @Override
    public Pair<Annotation, InputStream> read(InputStream is)
            throws IOException, ClassNotFoundException, ClassCastException {

        CoreNLPProtos.Document doc = CoreNLPProtos.Document.parseDelimitedFrom(is, registry);
        return Pair.makePair(fromProto(doc), is);
    }

    @Override
    protected CoreMap fromProtoNoTokens(Sentence proto) {

        CoreMap result = super.fromProtoNoTokens(proto);
        result.set(MyAnnotation.class, proto.getExtension(MyAppProtos.myNewField));

        return result;
    }
}

Run Code Online (Sandbox Code Playgroud)

Answer 1

tsc*_*hel 5

错误是我没有parseFrom向扩展注册表提供调用。

更改Sentence deserializedSentence = Sentence.parseFrom(serialized);为Sentence deserializedSentence = Sentence.parseFrom(serialized, registry);完成工作！

归档时间：	8 年，5 月前
查看次数：	2356 次
最近记录：	8 年，5 月前