我使用NLTK ne_chunk从文本中提取命名实体:
my_sent = "WASHINGTON -- In the wake of a string of abuses by New York police officers in the 1990s, Loretta E. Lynch, the top federal prosecutor in Brooklyn, spoke forcefully about the pain of a broken trust that African-Americans felt and said the responsibility for repairing generations of miscommunication and mistrust fell to law enforcement."
nltk.ne_chunk(my_sent, binary=True)
Run Code Online (Sandbox Code Playgroud)
但我无法弄清楚如何将这些实体保存到列表中?例如 -
print Entity_list
('WASHINGTON', 'New York', 'Loretta', 'Brooklyn', 'African')
Run Code Online (Sandbox Code Playgroud)
谢谢.
我正试图在nltk中使用ne_chunk和pos_tag来判断一个句子.
from nltk import tag
from nltk.tag import pos_tag
from nltk.tree import Tree
from nltk.chunk import ne_chunk
sentence = "Michael and John is reading a booklet in a library of Jakarta"
tagged_sent = pos_tag(sentence.split())
print_chunk = [chunk for chunk in ne_chunk(tagged_sent) if isinstance(chunk, Tree)]
print print_chunk
Run Code Online (Sandbox Code Playgroud)
这就是结果:
[Tree('GPE', [('Michael', 'NNP')]), Tree('PERSON', [('John', 'NNP')]), Tree('GPE', [('Jakarta', 'NNP')])]
Run Code Online (Sandbox Code Playgroud)
我的问题是,是否有可能不包括pos_tag(如上面的NNP)并且仅包括Tree'GPE','PERSON'?什么'GPE'的意思?
提前致谢