Spacy中的依赖解析树

Nir*_*lam 8 nlp spacy dependency-parsing

我有句子约翰在商店看到一个华丽的帽子
如何将其表示为依赖树,如下所示?

(S
      (NP (NNP John))
      (VP
        (VBD saw)
        (NP (DT a) (JJ flashy) (NN hat))
        (PP (IN at) (NP (DT the) (NN store)))))
Run Code Online (Sandbox Code Playgroud)

我从这里得到了这个脚本

import spacy
from nltk import Tree
en_nlp = spacy.load('en')

doc = en_nlp("John saw a flashy hat at the store")

def to_nltk_tree(node):
    if node.n_lefts + node.n_rights > 0:
        return Tree(node.orth_, [to_nltk_tree(child) for child in node.children])
    else:
        return node.orth_


[to_nltk_tree(sent.root).pretty_print() for sent in doc.sents]
Run Code Online (Sandbox Code Playgroud)

我得到以下,但我正在寻找树(NLTK)格式.

     saw                 
  ____|_______________    
 |        |           at 
 |        |           |   
 |       hat        store
 |     ___|____       |   
John  a      flashy  the
Run Code Online (Sandbox Code Playgroud)

小智 7

要为SpaCy依赖关系分析重新创建NLTK风格的树,请尝试使用drawfrom方法,nltk.tree而不是pretty_print

import spacy
from nltk.tree import Tree

spacy_nlp = spacy.load("en")

def nltk_spacy_tree(sent):
    """
    Visualize the SpaCy dependency tree with nltk.tree
    """
    doc = spacy_nlp(sent)
    def token_format(token):
        return "_".join([token.orth_, token.tag_, token.dep_])

    def to_nltk_tree(node):
        if node.n_lefts + node.n_rights > 0:
            return Tree(token_format(node),
                       [to_nltk_tree(child) 
                        for child in node.children]
                   )
        else:
            return token_format(node)

    tree = [to_nltk_tree(sent.root) for sent in doc.sents]
    # The first item in the list is the full tree
    tree[0].draw()
Run Code Online (Sandbox Code Playgroud)

请注意,由于SpaCy目前仅支持单词和名词短语级别的依赖项解析和标记,因此SpaCy树的结构不会像您从斯坦福解析器中得到的那样深,您也可以将其可视化作为一棵树:

from nltk.tree import Tree
from nltk.parse.stanford import StanfordParser

# Note: Download Stanford jar dependencies first
# See /sf/ask/971829421/
stanford_parser = StanfordParser(
    model_path="edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz"
)

def nltk_stanford_tree(sent):
    """
    Visualize the Stanford dependency tree with nltk.tree
    """
    parse = stanford_parser.raw_parse(sent)
    tree = list(parse)
    # The first item in the list is the full tree
    tree[0].draw()
Run Code Online (Sandbox Code Playgroud)

现在,如果我们同时运行,nltk_spacy_tree("John saw a flashy hat at the store.")就会产生这一形象nltk_stanford_tree("John saw a flashy hat at the store.")会产生这一个


ada*_*.ra 4

除了文本表示之外,您想要实现的是从依赖图中获取选区树。您所需输出的示例是经典的选区树(如短语结构语法,而不是依存语法)。

虽然从选区树到依赖图的转换或多或少是一项自动化任务(例如,https://www.researchgate.net/publication/324940566_Guidelines_for_the_CLEAR_Style_Constituent_to_Dependency_Conversion),但另一个方向却不是。已经有这方面的工作,请查看 PAD 项目https://github.com/ikekonglp/PAD和描述底层算法的论文: http: //homes.cs.washington.edu/~nasmith/papers/kong+拉什+史密斯.naacl15.pdf

您可能还想重新考虑是否确实需要选区解析,这里有一个很好的论点: https: //linguistics.stackexchange.com/questions/7280/why-is-constituency-needed-since-dependency-gets-the-工作更容易完成