从NLP中的名词阶段中提取名词

Question

从NLP中的名词阶段中提取名词

谁能告诉我如何从以下输出中只提取名词:

我已经使用以下程序基于给定的语法对字符串"给我电影评论"进行了标记化和解析: -

sent=nltk.word_tokenize(msg)
parser=nltk.ChartParser(grammar)
trees=parser.nbest_parse(sent)
for tree in trees:
    print tree
tokens=find_all_NP(tree)
tokens1=nltk.word_tokenize(tokens[0])
print tokens1

Run Code Online (Sandbox Code Playgroud)

并获得以下输出:

>>> 
(S
  (VP (V Give) (Det me))
  (NP (Det the) (N review) (PP (P of) (N movie))))
(S
  (VP (V Give) (Det me))
  (NP (Det the) (N review) (NP (PP (P of) (N movie)))))
['the', 'review', 'of', 'movie']
>>>

Run Code Online (Sandbox Code Playgroud)

现在我只想获得名词.我怎么做？

Answer 1

Joe*_*Joe 6

您不需要使用完整的解析器来获取名词.您只需使用标记器即可.您可以使用的一个功能是nltk.tag.pos_tag().这将返回带有单词和词性的元组列表.您将能够遍历元组并找到标有"NN"或"NNS"的单词,用于名词或复数名词.

NLTK有如何记录如何使用他们的标记.它可以在这里找到:https://nltk.googlecode.com/svn/trunk/doc/howto/tag.html 这里是如何使用标注器在NLTK本书的章节的链接:HTTPS://nltk.googlecode .COM/SVN /主干/ DOC /电子书/ ch05.html

每个地方都有许多代码示例.

归档时间：	14 年，11 月前
查看次数：	5882 次
最近记录：	14 年，11 月前