如何在spaCy中获取句子编号?

Goo*_*bot 6 python nlp spacy

我得到字符串的标记为

doc = nlp(u"This is the first sentence. This is the second sentence.")
for token in doc:
    print(token.i, token.text)
Run Code Online (Sandbox Code Playgroud)

与输出

0 This
1 is
2 the
3 first
4 sentence
5 .
6 This
7 is
8 the
9 second
10 sentence
11 .
Run Code Online (Sandbox Code Playgroud)

我怎样才能得到句子编号(SENTENCE_NUMBER, token.i, token.text)

0 0 This
0 1 is
0 2 the
0 3 first
0 4 sentence
0 5 .
1 0 This
1 1 is
1 2 the
1 3 second
1 4 sentence
1 5 .
Run Code Online (Sandbox Code Playgroud)

我可以在循环中重置令牌编号,但是如何从中获取句子编号doc

aab*_*aab 10

没有内置的句子索引,但您可以迭代句子:

for sent_i, sent in enumerate(doc.sents):
    for token in sent:
        print(sent_i, token.i, token.text)
Run Code Online (Sandbox Code Playgroud)

如果您需要存储句子索引以便在其他地方使用,您可以使用自定义扩展将句子索引保存在跨度或标记上:https://spacy.io/usage/processing-pipelines#custom-components-attributes