我在python中使用nttk的Stanford解析器,并从Stanford Parser和NLTK获得帮助 以建立斯坦福nlp库.
from nltk.parse.stanford import StanfordParser
from nltk.parse.stanford import StanfordDependencyParser
parser = StanfordParser(model_path="edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz")
dep_parser = StanfordDependencyParser(model_path="edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz")
one = ("John sees Bill")
parsed_Sentence = parser.raw_parse(one)
# GUI
for line in parsed_Sentence:
print line
line.draw()
parsed_Sentence = [parse.tree() for parse in dep_parser.raw_parse(one)]
print parsed_Sentence
# GUI
for line in parsed_Sentence:
print line
line.draw()
Run Code Online (Sandbox Code Playgroud)
我得到了错误的解析和依赖树,如下例所示,它将'see'视为名词而不是动词.
我该怎么办?当我改变句子时,它完全正常工作,例如(一个='John see Bill').从这里正确的解析树输出可以看到这句话的正确输出
正确输出的示例如下所示:
所以我得到了"标准"斯坦福分析师的工作,感谢危险89对前一篇文章斯坦福分析师和NLTK的回答.
但是,我现在正试图让依赖解析器工作,似乎前一个链接中突出显示的方法不再有效.这是我的代码:
import nltk
import os
java_path = "C:\\Program Files\\Java\\jre1.8.0_51\\bin\\java.exe"
os.environ['JAVAHOME'] = java_path
from nltk.parse import stanford
os.environ['STANFORD_PARSER'] = 'path/jar'
os.environ['STANFORD_MODELS'] = 'path/jar'
parser = stanford.StanfordDependencyParser(model_path="path/jar/englishPCFG.ser.gz")
sentences = parser.raw_parse_sents(nltk.sent_tokenize("The iPod is expensive but pretty."))
Run Code Online (Sandbox Code Playgroud)
我收到以下错误:'module'对象没有属性'StanfordDependencyParser'
我唯一改变的是"StanfordParser"中的"StanfordDependencyParser".任何想法我怎么能让这个工作?
我还尝试了Stanford Neural Dependency解析器,如下面的文档中所示:http://www.nltk.org/_modules/nltk/parse/stanford.html
这个也不起作用.
非常新的NLTK.提前感谢任何有用的输入.
我试图从依赖解析器的输出中创建一棵树(嵌套字典)。这句话是“我在睡梦中射杀了一头大象”。我能够获得链接中所述的输出: How do I do dependency parsing in NLTK?
nsubj(shot-2, I-1)
det(elephant-4, an-3)
dobj(shot-2, elephant-4)
prep(shot-2, in-5)
poss(sleep-7, my-6)
pobj(in-5, sleep-7)
Run Code Online (Sandbox Code Playgroud)
为了将此元组列表转换为嵌套字典,我使用了以下链接: 如何将 python 元组列表转换为树?
def build_tree(list_of_tuples):
all_nodes = {n[2]:((n[0], n[1]),{}) for n in list_of_tuples}
root = {}
print all_nodes
for item in list_of_tuples:
rel, gov,dep = item
if gov is not 'ROOT':
all_nodes[gov][1][dep] = all_nodes[dep]
else:
root[dep] = all_nodes[dep]
return root
Run Code Online (Sandbox Code Playgroud)
输出如下:
{'shot': (('ROOT', 'ROOT'),
{'I': (('nsubj', 'shot'), {}),
'elephant': (('dobj', 'shot'), {'an': (('det', 'elephant'), {})}),
'sleep': (('nmod', 'shot'), …Run Code Online (Sandbox Code Playgroud) 我在Ubuntu 13.10中安装了python(2.7.5)和python-nltk软件包.运行apt-cache policy python-nltk回报:
python-nltk:
Installed: 2.0~b9-0ubuntu4
Run Code Online (Sandbox Code Playgroud)
根据斯坦福大学的网站,2.0 +应该有stanford模块.然而,当我尝试导入它时,我收到一个错误:
>>> import nltk.tag.stanford
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: No module named stanford
Run Code Online (Sandbox Code Playgroud)
我如何获得stanford模块?(最好通过通常的存储库,因为我不喜欢在Ubuntu包管理器之外安装软件.)
我有一句话,我需要单独识别人名:
例如:
sentence = "Larry Page is an American business magnate and computer scientist who is the co-founder of Google, alongside Sergey Brin"
Run Code Online (Sandbox Code Playgroud)
我使用下面的代码来识别NER.
from nltk import word_tokenize, pos_tag, ne_chunk
print(ne_chunk(pos_tag(word_tokenize(sentence))))
Run Code Online (Sandbox Code Playgroud)
我收到的输出是:
(S
(PERSON Larry/NNP)
(ORGANIZATION Page/NNP)
is/VBZ
an/DT
(GPE American/JJ)
business/NN
magnate/NN
and/CC
computer/NN
scientist/NN
who/WP
is/VBZ
the/DT
co-founder/NN
of/IN
(GPE Google/NNP)
,/,
alongside/RB
(PERSON Sergey/NNP Brin/NNP))
Run Code Online (Sandbox Code Playgroud)
我想提取所有人名,例如
Larry Page
Sergey Brin
Run Code Online (Sandbox Code Playgroud)
为了达到这个目的,我对此链接进行了审核并尝试了这一点.
from nltk.tag.stanford import StanfordNERTagger
st = StanfordNERTagger('/usr/share/stanford-ner/classifiers/english.all.3class.distsim.crf.ser.gz','/usr/share/stanford-ner/stanford-ner.jar')
Run Code Online (Sandbox Code Playgroud)
但是我继续得到这个错误:
LookupError: Could not find stanford-ner.jar jar …Run Code Online (Sandbox Code Playgroud) 我试图弄清楚如何使用NLTK的实体识别器的无壳版本.我下载了http://nlp.stanford.edu/software/stanford-ner-2015-04-20.zip并将其放在python的site-packages文件夹中.然后我下载了http://nlp.stanford.edu/software/stanford-corenlp-caseless-2015-04-20-models.jar并将其放在文件夹中.然后我在NLTK中运行了这段代码
from nltk.tag.stanford import NERTagger
english_nertagger = NERTagger(‘/home/anaconda/lib/python2.7/site-packages/stanford-ner-2015-04-20/classifiers/english.conll.4class.distsim.crf.ser.gz’, ‘/home/anaconda/lib/python2.7/site-packages/stanford-ner-2015-04-20/stanford-corenlp-caseless-2015-04-20-models.jar’)
Run Code Online (Sandbox Code Playgroud)
但当我运行这个:
english_nertagger.tag(‘Rami Eid is studying at stony brook university in NY’.split())
Run Code Online (Sandbox Code Playgroud)
我收到一个错误:
Error: Could not find or load main class edu.stanford.nlp.ie.crf.CRFClassifier
Run Code Online (Sandbox Code Playgroud)
如果您有经验,任何帮助表示赞赏!
PS我可以让非caseless版本工作正常,但我发现在分析搜索查询时,用户几乎不会大写单词,而非caseless版本似乎完全错过了单词,如果它们没有大写.