为什么使用nltk的Stanford解析器无法正确解析句子?

Nom*_*uks 6 python parsing nlp nltk stanford-nlp

我在python中使用nttk的Stanford解析器,并从Stanford Parser和NLTK获得帮助 以建立斯坦福nlp库.

from nltk.parse.stanford import StanfordParser
from nltk.parse.stanford import StanfordDependencyParser
parser     = StanfordParser(model_path="edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz")
dep_parser = StanfordDependencyParser(model_path="edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz")
one = ("John sees Bill")
parsed_Sentence = parser.raw_parse(one)
# GUI
for line in parsed_Sentence:
       print line
       line.draw()

parsed_Sentence = [parse.tree() for parse in dep_parser.raw_parse(one)]
print parsed_Sentence

# GUI
for line in parsed_Sentence:
        print line
        line.draw()
Run Code Online (Sandbox Code Playgroud)

我得到了错误的解析和依赖树,如下例所示,它将'see'视为名词而不是动词.

示例解析树 示例依赖关系树

我该怎么办?当我改变句子时,它完全正常工作,例如(一个='John see Bill').从这里正确的解析树输出可以看到这句话的正确输出

正确输出的示例如下所示:

正确解析

正确的依赖解析树

alv*_*vas 7

再一次,没有模型是完美的(参见Python NLTK pos_tag没有返回正确的词性标签); P

您可以尝试使用"更准确"的解析器NeuralDependencyParser.

首先使用正确的环境变量正确设置解析器(请参阅Stanford Parser和NLTK以及https://gist.github.com/alvations/e1df0ba227e542955a8a),然后:

>>> from nltk.internals import find_jars_within_path
>>> from nltk.parse.stanford import StanfordNeuralDependencyParser
>>> parser = StanfordNeuralDependencyParser(model_path="edu/stanford/nlp/models/parser/nndep/english_UD.gz")
>>> stanford_dir = parser._classpath[0].rpartition('/')[0]
>>> slf4j_jar = stanford_dir + '/slf4j-api.jar'
>>> parser._classpath = list(parser._classpath) + [slf4j_jar]
>>> parser.java_options = '-mx5000m'
>>> sent = "John sees Bill"
>>> [parse.tree() for parse in parser.raw_parse(sent)]
[Tree('sees', ['John', 'Bill'])]
Run Code Online (Sandbox Code Playgroud)

请注意,NeuralDependencyParser只生成依赖树:

在此输入图像描述