Stanford-NER定制,用于对软件编程关键字进行分类

Question

Stanford-NER定制,用于对软件编程关键字进行分类

Tec*_*ech 1 java nlp classification stanford-nlp

我是NLP的新手,我使用Stanford NER工具对一些随机文本进行分类,以提取软件编程中使用的特殊关键字.

问题是,我不知道如何对Stanford NER中的分类器和文本注释器进行更改以识别软件编程关键字.例如:

today Java used in different operating systems (Windows, Linux, ..)

Run Code Online (Sandbox Code Playgroud)

分类结果应如下:

Java "Programming_Language"
Windows "Operating_System"
Linux "Operating_system"

Run Code Online (Sandbox Code Playgroud)

请问如何定制StanfordNER分类器以满足我的需求？

Answer 1

Rah*_*hul 6

我认为在斯坦福NER常见问题解答部分http://nlp.stanford.edu/software/crf-faq.shtml#a中有相当详细的记录.

以下是步骤:

在属性文件中,更改地图以指定训练数据的注释(或结构化)方式

map = word = 0,myfeature = 1,answer = 2

在 src\edu\stanford\nlp\sequences\SeqClassifierFlags.java

添加一个标志,表明您要使用新功能,让我们称之为useMyFeature下面public boolean useLabelSource = false ,添加公共布尔值useMyFeature = true;

在告诉工具setProperties(Properties props, boolean printProps)之后的方法中的同一文件中else if (key.equalsIgnoreCase("useTrainLexicon")) { ..},如果此标志为您打开/关闭
```
else if (key.equalsIgnoreCase("useMyFeature")) {
      useMyFeature= Boolean.parseBoolean(val);
}
```
Run Code Online (Sandbox Code Playgroud)

在src/edu/stanford/nlp/ling/CoreAnnotations.java,添加以下部分

public static class myfeature implements CoreAnnotation<String> {
  public Class<String> getType() {
    return String.class;
  }
}

Run Code Online (Sandbox Code Playgroud)

在src/edu/stanford/nlp/ling/AnnotationLookup.java中 public enumKeyLookup{..} 在底部添加

MY_TAG(CoreAnnotations.myfeature.class, "我的功能")

在src\edu\stanford\nlp\ie\NERFeatureFactory.java,取决于它的"类型"功能,添加

protected Collection<String> featuresC(PaddedList<IN> cInfo, int loc)

if(flags.useRahulPOSTAGS){
    featuresC.add(c.get(CoreAnnotations.myfeature.class)+"-my_tag");
}

Run Code Online (Sandbox Code Playgroud)

调试:除此之外,还有一些方法可以将功能转储到文件中,使用它们来查看事情是如何完成的.另外,我认为你也需要花一些时间使用调试器:P

归档时间：	11 年，6 月前
查看次数：	1358 次
最近记录：	7 年，7 月前