使用斯坦福 CoreNLP 进行关系提取

Question

使用斯坦福 CoreNLP 进行关系提取

dab*_*abe 5 nlp text-mining stanford-nlp

我正在尝试使用斯坦福 CoreNLP 库从自然语言内容中提取信息。

我的目标是从句子中提取“主语-动作-宾语”对（简化）。

作为一个例子，请考虑以下句子：

约翰·史密斯午餐只吃一个苹果和一个香蕉。他正在节食，他的母亲告诉他，午餐少吃一点会非常健康。约翰一点也不喜欢，但由于他对饮食非常严格，所以他不想停止。

从这句话我想得到如下结果：

约翰·史密斯 - 吃 - 午餐只吃一个苹果和一个香蕉
他 - 正在 - 节食
他的母亲告诉他，午餐少吃会非常健康。
约翰 - 不喜欢 - （一点也不）
他——是——对他的饮食很认真

一个人会怎样做呢？

或者更具体地说：如何解析依赖树（或更适合的树？）以获得上面指定的结果？

任何给定此任务的提示、资源或代码片段都将受到高度赞赏。

旁注：我设法用它们的代表性提及替换共指，然后将he和更改his为相应的实体（在这种情况下为约翰·史密斯）。

Answer 1

Sta*_*elp 5

斯坦福 CoreNLP 工具包附带了一个依赖解析器。

首先，这是一个描述树中边类型的链接：

http://universaldependency.github.io/docs/

您可以通过多种方式使用该工具包生成依赖关系树。

以下是一些可以帮助您入门的示例代码：

import java.io.*;
import java.util.*;
import edu.stanford.nlp.io.*;
import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.util.*;
import edu.stanford.nlp.semgraph.*;
import edu.stanford.nlp.trees.TreeCoreAnnotations.*;

public class DependencyTreeExample {

    public static void main (String[] args) throws IOException {

        // set up properties
        Properties props = new Properties();
        props.setProperty("ssplit.eolonly","true");
        props.setProperty("annotators",
                "tokenize, ssplit, pos, depparse");
        // set up pipeline
        StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
        // get contents from file
        String content = new Scanner(new File(args[0])).useDelimiter("\\Z").next();
        System.out.println(content);
        // read in a product review per line
        Annotation annotation = new Annotation(content);
        pipeline.annotate(annotation);

        List<CoreMap> sentences = annotation.get(CoreAnnotations.SentencesAnnotation.class);
        for (CoreMap sentence : sentences) {
            System.out.println("---");
            System.out.println("sentence: "+sentence);
            SemanticGraph tree = sentence.get(SemanticGraphCoreAnnotations.CollapsedCCProcessedDependenciesAnnotation.class);
            System.out.println(tree.toString(SemanticGraph.OutputFormat.READABLE));
        }


    }

}

Run Code Online (Sandbox Code Playgroud)

指示：

将其剪切并粘贴到 DependencyTreeExample.java 中
将该文件放入目录 stanford-corenlp-full-2015-04-20
javac -cp“*：。” DependencyTreeExample.java
将每行一个句子添加到名为 dependency_sentences.txt 的文件中
java -cp“*：。” DependencyTreeExample dependency_sentences.txt

输出示例：

sentence: John doesn't like it at all.
dep                 reln                gov                 
---                 ----                ---                 
like-4              root                root                
John-1              nsubj               like-4              
does-2              aux                 like-4              
n't-3               neg                 like-4              
it-5                dobj                like-4              
at-6                case                all-7               
all-7               nmod:at             like-4              
.-8                 punct               like-4

Run Code Online (Sandbox Code Playgroud)

这将打印出依赖关系解析。通过使用 SemanticGraph 对象，您可以编写代码来查找所需的模式类型。

您会注意到，在此示例中，“like”通过“nsubj”指向“John”，“like”通过“dobj”指向“it”

作为参考，您应该查看 edu.stanford.nlp.semgraph.SemanticGraph

http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/semgraph/SemanticGraph.html

归档时间：	10 年，3 月前
查看次数：	5097 次
最近记录：	8 年，9 月前