如何使用Stanford Parser或Stanford CoreNLP找到名词短语的语法关系

azp*_*lic 3 nlp stanford-nlp

我正在使用stanford CoreNLP来尝试找到名词短语的语法关系.

这是一个例子:

鉴于句子"健身房很脏".

我设法将"健身房"识别为我的目标名词短语.我现在正在寻找一种方法来发现"脏"的形容词与"健身室"有关系,而不仅仅是 "房间".

示例代码:

private static void doSentenceTest(){
    Properties props = new Properties();
    props.put("annotators","tokenize, ssplit, pos, lemma, ner, parse, dcoref");
    StanfordCoreNLP stanford = new StanfordCoreNLP(props);

    TregexPattern npPattern = TregexPattern.compile("@NP");

    String text = "The fitness room was dirty.";


    // create an empty Annotation just with the given text
    Annotation document = new Annotation(text);
    // run all Annotators on this text
    stanford.annotate(document);

    List<CoreMap> sentences = document.get(CoreAnnotations.SentencesAnnotation.class);
    for (CoreMap sentence : sentences) {

        Tree sentenceTree = sentence.get(TreeCoreAnnotations.TreeAnnotation.class);
        TregexMatcher matcher = npPattern.matcher(sentenceTree);

        while (matcher.find()) {
            //this tree should contain "The fitness room" 
            Tree nounPhraseTree = matcher.getMatch();
            //Question : how do I find that "dirty" has a relationship to the nounPhraseTree


        }

        // Output dependency tree
        TreebankLanguagePack tlp = new PennTreebankLanguagePack();
        GrammaticalStructureFactory gsf = tlp.grammaticalStructureFactory();
        GrammaticalStructure gs = gsf.newGrammaticalStructure(sentenceTree);
        Collection<TypedDependency> tdl = gs.typedDependenciesCollapsed();

        System.out.println("typedDependencies: "+tdl); 

    }

}
Run Code Online (Sandbox Code Playgroud)

我用Stanford CoreNLP在句子中提取了它的根树对象.在这个树对象上,我设法使用TregexPattern和TregexMatcher提取名词短语.这给了我一个包含实际名词短语的子树.我想知道的是在原始句子中找到名词短语的修饰语.

typedDependecies ouptut给了我以下内容:

typedDependencies: [det(room-3, The-1), nn(room-3, fitness-2), nsubj(dirty-5, room-3), cop(dirty-5, was-4), root(ROOT-0, dirty-5)]
Run Code Online (Sandbox Code Playgroud)

在哪里我可以看到nsubj(dirty-5,room-3),但我没有完整的名词短语作为支配者.

我希望我足够清楚.任何帮助赞赏.

Cht*_*ect 5

类型依赖关系确实表明形容词'脏'适用于'健身室':

det(room-3, The-1)
nn(room-3, fitness-2)
nsubj(dirty-5, room-3)
cop(dirty-5, was-4)
root(ROOT-0, dirty-5)
Run Code Online (Sandbox Code Playgroud)

'nn'标签是名词复合修饰符,表示'fitness'是'room'的修饰符.

您可以在斯坦福相关性手册中找到有关依赖关系标签的详细信息.