AttributeError：'spacy.tokens.span.Span'对象没有属性'merge'

Question

AttributeError：'spacy.tokens.span.Span'对象没有属性'merge'

我正在开发一个 nlp 项目并尝试遵循本教程https://medium.com/@ageitgey/natural-language-processing-is-fun-9a0bff37854e \n并在执行这部分时

\n

import spacy\n\n# Load the large English NLP model\nnlp = spacy.load(\'en_core_web_lg\')\n\n# Replace a token with "REDACTED" if it is a name\ndef replace_name_with_placeholder(token):\n   if token.ent_iob != 0 and token.ent_type_ == "PERSON":\n    return "[REDACTED] "\n  else:\n    return token.string\n\n # Loop through all the entities in a document and check if they are names\ndef scrub(text):\ndoc = nlp(text)\nfor ent in doc.ents:\n    ent.merge()\ntokens = map(replace_name_with_placeholder, doc)\nreturn "".join(tokens)\n\ns = """\nIn 1950, Alan Turing published his famous article "Computing Machinery and Intelligence". \nIn 1957, Noam Chomsky\xe2\x80\x99s \n Syntactic Structures revolutionized Linguistics with \'universal grammar\', a rule based system of \n syntactic structures.\n """\n\n print(scrub(s))\n

Run Code Online (Sandbox Code Playgroud)\n

出现这个错误

\n

---------------------------------------------------------------------------\nAttributeError                            Traceback (most recent call last)\n<ipython-input-62-ab1c786c4914> in <module>\n  4 """\n  5 \n  ----> 6 print(scrub(s))\n\n<ipython-input-60-4742408aa60f> in scrub(text)\n  3     doc = nlp(text)\n  4     for ent in doc.ents:\n  ----> 5         ent.merge()\n  6     tokens = map(replace_name_with_placeholder, doc)\n  7     return "".join(tokens)\n\n AttributeError: \'spacy.tokens.span.Span\' object has no attribute \'merge\'\n

Run Code Online (Sandbox Code Playgroud)\n

Answer 1

小智 6

span.merge()自该教程制作以来，Spacy 就取消了该方法。现在执行此操作的方法是使用doc.retokenize()： https: //spacy.io/api/doc#retokenize。我为你实现了它scrub功能实现了以下功能：

\n

# Loop through all the entities in a document and check if they are names\ndef scrub(text):\n    doc = nlp(text)\n    with doc.retokenize() as retokenizer:\n        for ent in doc.ents:\n            retokenizer.merge(ent)\n    tokens = map(replace_name_with_placeholder, doc)\n    return "".join(tokens)\n\ns = """\nIn 1950, Alan Turing published his famous article "Computing Machinery and Intelligence". \nIn 1957, Noam Chomsky\xe2\x80\x99s \n Syntactic Structures revolutionized Linguistics with \'universal grammar\', a rule based system of \n syntactic structures.\n """\n\nprint(scrub(s))\n

Run Code Online (Sandbox Code Playgroud)\n

其他注意事项：

\n

你的replace_name_with_placeholder函数会抛出一个错误，使用token.text，我在下面修复了它：

\n

 def replace_name_with_placeholder(token):\n     if token.ent_iob != 0 and token.ent_type_ == "PERSON":\n         return "[REDACTED] "\n     else:\n         return token.text\n

Run Code Online (Sandbox Code Playgroud)\n

如果您正在提取实体以及其他跨度，例如doc.noun_chunks，您可能会遇到一些问题，例如：
\n
```
 ValueError: [E102] Can\'t merge non-disjoint spans. \'Computing\' is already part of \n tokens to merge. If you want to find the longest non-overlapping spans, you can \n use the util.filter_spans helper:\n https://spacy.io/api/top-level#util.filter_spans\n
```
Run Code Online (Sandbox Code Playgroud)\n
因此，您可能还需要查看spacy.util.filter_spans：\n https://spacy.io/api/top-level#util.filter_spans。
\n

\n

归档时间：	4 年，9 月前
查看次数：	6118 次
最近记录：	4 年，9 月前