如何在Spacy中获取所有名词短语

Question

如何在Spacy中获取所有名词短语

我是新手Spacy，我想从一个句子中提取“所有”名词短语。我想知道我该怎么做。我有以下代码：

import spacy

nlp = spacy.load("en")

file = open("E:/test.txt", "r")
doc = nlp(file.read())
for np in doc.noun_chunks:
    print(np.text)

Run Code Online (Sandbox Code Playgroud)

但是它只返回基本名词短语，即其中没有其他短语的短语NP。也就是说，对于以下短语，我得到以下结果：

短语： We try to explicitly describe the geometry of the edges of the images.

结果：We, the geometry, the edges, the images。

预期结果： We, the geometry, the edges, the images, the geometry of the edges of the images, the edges of the images.

如何获得所有名词短语，包括嵌套短语？

Answer 1

Adn*_*n S 5

请参阅下面的注释代码以递归方式组合名词。代码受Spacy文档启发

import spacy

nlp = spacy.load("en")

doc = nlp("We try to explicitly describe the geometry of the edges of the images.")

for np in doc.noun_chunks: # use np instead of np.text
    print(np)

print()

# code to recursively combine nouns
# 'We' is actually a pronoun but included in your question
# hence the token.pos_ == "PRON" part in the last if statement
# suggest you extract PRON separately like the noun-chunks above

index = 0
nounIndices = []
for token in doc:
    # print(token.text, token.pos_, token.dep_, token.head.text)
    if token.pos_ == 'NOUN':
        nounIndices.append(index)
    index = index + 1


print(nounIndices)
for idxValue in nounIndices:
    doc = nlp("We try to explicitly describe the geometry of the edges of the images.")
    span = doc[doc[idxValue].left_edge.i : doc[idxValue].right_edge.i+1]
    span.merge()

    for token in doc:
        if token.dep_ == 'dobj' or token.dep_ == 'pobj' or token.pos_ == "PRON":
            print(token.text)

Run Code Online (Sandbox Code Playgroud)

归档时间：	8 年，3 月前
查看次数：	3822 次
最近记录：	6 年，6 月前