tsc*_*hel 1 protocol-buffers stanford-nlp
我是协议缓冲区的新手,并尝试弄清楚如何在斯坦福 CoreNLP 库中扩展消息类型,如下所述: https: //nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/管道/ProtobufAnnotationSerializer.html
问题:我可以设置扩展字段,但无法获取它。我将问题归结为下面的代码。在原始消息中,字段名称被反序列化消息中的[edu.stanford.nlp.pipeline.myNewField]字段编号替换。101
我如何获取 myNewField 的值?
PS:这篇文章/sf/ask/2017065011/建议它应该像调用一样简单getExtension(MyAppProtos.myNewField)
定制原型
syntax = "proto2";
package edu.stanford.nlp.pipeline;
option java_package = "com.example.my.awesome.nlp.app";
option java_outer_classname = "MyAppProtos";
import "CoreNLP.proto";
extend Sentence {
optional uint32 myNewField = 101;
}
Run Code Online (Sandbox Code Playgroud)
原型测试.java
import com.example.my.awesome.nlp.app.MyAppProtos;
import com.google.protobuf.ExtensionRegistry;
import com.google.protobuf.InvalidProtocolBufferException;
import edu.stanford.nlp.pipeline.CoreNLPProtos;
import edu.stanford.nlp.pipeline.CoreNLPProtos.Sentence;
public class ProtoTest {
static {
ExtensionRegistry registry = ExtensionRegistry.newInstance();
registry.add(MyAppProtos.myNewField);
CoreNLPProtos.registerAllExtensions(registry);
}
public static void main(String[] args) throws InvalidProtocolBufferException {
Sentence originalSentence = Sentence.newBuilder()
.setText("Hello world!")
.setTokenOffsetBegin(0)
.setTokenOffsetEnd(12)
.setExtension(MyAppProtos.myNewField, 13)
.build();
System.out.println("Original:\n" + originalSentence);
byte[] serialized = originalSentence.toByteArray();
Sentence deserializedSentence = Sentence.parseFrom(serialized);
System.out.println("Deserialized:\n" + deserializedSentence);
Integer myNewField = deserializedSentence.getExtension(MyAppProtos.myNewField);
System.out.println("MyNewField: " + myNewField);
}
}
Run Code Online (Sandbox Code Playgroud)
输出:
Original:
tokenOffsetBegin: 0
tokenOffsetEnd: 12
text: "Hello world!"
[edu.stanford.nlp.pipeline.myNewField]: 13
Deserialized:
tokenOffsetBegin: 0
tokenOffsetEnd: 12
text: "Hello world!"
101: 13
MyNewField: 0
Run Code Online (Sandbox Code Playgroud)
更新
因为这个问题是关于扩展 CoreNLP 消息类型并将它们与 一起使用ProtobufAnnotationSerializer,所以我的扩展序列化器如下所示:
import java.io.IOException;
import java.io.InputStream;
import java.util.Set;
import com.example.my.awesome.nlp.app.MyAppProtos;
import com.google.protobuf.ExtensionRegistry;
import edu.stanford.nlp.pipeline.Annotation;
import edu.stanford.nlp.pipeline.CoreNLPProtos;
import edu.stanford.nlp.pipeline.CoreNLPProtos.Sentence;
import edu.stanford.nlp.pipeline.CoreNLPProtos.Sentence.Builder;
import edu.stanford.nlp.pipeline.ProtobufAnnotationSerializer;
import edu.stanford.nlp.util.CoreMap;
import edu.stanford.nlp.util.Pair;
public class MySerializer extends ProtobufAnnotationSerializer {
private static ExtensionRegistry registry;
static {
registry = ExtensionRegistry.newInstance();
registry.add(MyAppProtos.myNewField);
CoreNLPProtos.registerAllExtensions(registry);
}
@Override
protected Builder toProtoBuilder(CoreMap sentence, Set<Class<?>> keysToSerialize) {
keysToSerialize.remove(MyAnnotation.class);
Builder builder = super.toProtoBuilder(sentence, keysToSerialize);
builder.setExtension(MyAppProtos.myNewField, 13);
return builder;
}
@Override
public Pair<Annotation, InputStream> read(InputStream is)
throws IOException, ClassNotFoundException, ClassCastException {
CoreNLPProtos.Document doc = CoreNLPProtos.Document.parseDelimitedFrom(is, registry);
return Pair.makePair(fromProto(doc), is);
}
@Override
protected CoreMap fromProtoNoTokens(Sentence proto) {
CoreMap result = super.fromProtoNoTokens(proto);
result.set(MyAnnotation.class, proto.getExtension(MyAppProtos.myNewField));
return result;
}
}
Run Code Online (Sandbox Code Playgroud)
错误是我没有parseFrom向扩展注册表提供调用。
更改Sentence deserializedSentence = Sentence.parseFrom(serialized);为Sentence deserializedSentence = Sentence.parseFrom(serialized, registry);完成工作!
| 归档时间: |
|
| 查看次数: |
2356 次 |
| 最近记录: |