相关疑难解决方法(0)

NLTK将实体识别命名为Python列表

我使用NLTK ne_chunk从文本中提取命名实体:

my_sent = "WASHINGTON -- In the wake of a string of abuses by New York police officers in the 1990s, Loretta E. Lynch, the top federal prosecutor in Brooklyn, spoke forcefully about the pain of a broken trust that African-Americans felt and said the responsibility for repairing generations of miscommunication and mistrust fell to law enforcement."


nltk.ne_chunk(my_sent, binary=True)
Run Code Online (Sandbox Code Playgroud)

但我无法弄清楚如何将这些实体保存到列表中?例如 -

print Entity_list
('WASHINGTON', 'New York', 'Loretta', 'Brooklyn', 'African')
Run Code Online (Sandbox Code Playgroud)

谢谢.

python nlp named-entity-recognition nltk

14
推荐指数
4
解决办法
4万
查看次数

Python(NLTK)-提取名词短语的更有效方法?

我有一个涉及大量文本数据的机器学习任务。我想在训练文本中识别并提取名词短语,以便稍后在管道中将其用于特征构建。我已经从文本中提取了我想要的名词短语的类型,但是我对NLTK还是很陌生,所以我以一种可以分解列表理解的每一步的方式来解决这个问题,如下所示。

但是我真正的问题是,我在这里重塑车轮吗?有没有我看不到的更快的方法?

import nltk
import pandas as pd

myData = pd.read_excel("\User\train_.xlsx")
texts = myData['message']

# Defining a grammar & Parser
NP = "NP: {(<V\w+>|<NN\w?>)+.*<NN\w?>}"
chunkr = nltk.RegexpParser(NP)

tokens = [nltk.word_tokenize(i) for i in texts]

tag_list = [nltk.pos_tag(w) for w in tokens]

phrases = [chunkr.parse(sublist) for sublist in tag_list]

leaves = [[subtree.leaves() for subtree in tree.subtrees(filter = lambda t: t.label == 'NP')] for tree in phrases]
Run Code Online (Sandbox Code Playgroud)

将我们最终得到的元组列表的列表扁平化为仅元组列表的列表

leaves = [tupls for sublists in leaves for tupls in sublists]
Run Code Online (Sandbox Code Playgroud)

将提取的术语加入一个二元组

nounphrases = …
Run Code Online (Sandbox Code Playgroud)

nlp nltk python-3.x pandas text-chunking

6
推荐指数
1
解决办法
5159
查看次数

如何遍历NLTK树对象?

给定一个括号内的解析,我可以将它转换为NLTK中的Tree对象:

>>> from nltk.tree import Tree
>>> s = '(ROOT (S (NP (NNP Europe)) (VP (VBZ is) (PP (IN in) (NP (DT the) (JJ same) (NNS trends)))) (. .)))'
>>> Tree.fromstring(s)
Tree('ROOT', [Tree('S', [Tree('NP', [Tree('NNP', ['Europe'])]), Tree('VP', [Tree('VBZ', ['is']), Tree('PP', [Tree('IN', ['in']), Tree('NP', [Tree('DT', ['the']), Tree('JJ', ['same']), Tree('NNS', ['trends'])])])]), Tree('.', ['.'])])])
Run Code Online (Sandbox Code Playgroud)

但是当我尝试遍历它时,我只能访问最顶层的树:

>>> for i in Tree.fromstring(s):
...     print i
... 
(S
  (NP (NNP Europe))
  (VP (VBZ is) (PP (IN in) (NP (DT the) (JJ same) (NNS trends))))
  (. .)) …
Run Code Online (Sandbox Code Playgroud)

tree parsing nlp nltk depth-first-search

5
推荐指数
1
解决办法
6544
查看次数