AttributeError:'spacy.tokens.span.Span'对象没有属性'merge'

eya*_*klt 3 python nlp spacy

我正在开发一个 nlp 项目并尝试遵循本教程https://medium.com/@ageitgey/natural-language-processing-is-fun-9a0bff37854e \n并在执行这部分时

\n
import spacy\n\n# Load the large English NLP model\nnlp = spacy.load(\'en_core_web_lg\')\n\n# Replace a token with "REDACTED" if it is a name\ndef replace_name_with_placeholder(token):\n   if token.ent_iob != 0 and token.ent_type_ == "PERSON":\n    return "[REDACTED] "\n  else:\n    return token.string\n\n # Loop through all the entities in a document and check if they are names\ndef scrub(text):\ndoc = nlp(text)\nfor ent in doc.ents:\n    ent.merge()\ntokens = map(replace_name_with_placeholder, doc)\nreturn "".join(tokens)\n\ns = """\nIn 1950, Alan Turing published his famous article "Computing Machinery and Intelligence". \nIn 1957, Noam Chomsky\xe2\x80\x99s \n Syntactic Structures revolutionized Linguistics with \'universal grammar\', a rule based system of \n syntactic structures.\n """\n\n print(scrub(s))\n
Run Code Online (Sandbox Code Playgroud)\n

出现这个错误

\n
---------------------------------------------------------------------------\nAttributeError                            Traceback (most recent call last)\n<ipython-input-62-ab1c786c4914> in <module>\n  4 """\n  5 \n  ----> 6 print(scrub(s))\n\n<ipython-input-60-4742408aa60f> in scrub(text)\n  3     doc = nlp(text)\n  4     for ent in doc.ents:\n  ----> 5         ent.merge()\n  6     tokens = map(replace_name_with_placeholder, doc)\n  7     return "".join(tokens)\n\n AttributeError: \'spacy.tokens.span.Span\' object has no attribute \'merge\'\n
Run Code Online (Sandbox Code Playgroud)\n

小智 6

span.merge()自该教程制作以来,Spacy 就取消了该方法。现在执行此操作的方法是使用doc.retokenize(): https: //spacy.io/api/doc#retokenize。我为你实现了它scrub功能实现了以下功能:

\n
# Loop through all the entities in a document and check if they are names\ndef scrub(text):\n    doc = nlp(text)\n    with doc.retokenize() as retokenizer:\n        for ent in doc.ents:\n            retokenizer.merge(ent)\n    tokens = map(replace_name_with_placeholder, doc)\n    return "".join(tokens)\n\ns = """\nIn 1950, Alan Turing published his famous article "Computing Machinery and Intelligence". \nIn 1957, Noam Chomsky\xe2\x80\x99s \n Syntactic Structures revolutionized Linguistics with \'universal grammar\', a rule based system of \n syntactic structures.\n """\n\nprint(scrub(s))\n
Run Code Online (Sandbox Code Playgroud)\n

其他注意事项:

\n
    \n
  1. 你的replace_name_with_placeholder函数会抛出一个错误,使用token.text,我在下面修复了它:

    \n
     def replace_name_with_placeholder(token):\n     if token.ent_iob != 0 and token.ent_type_ == "PERSON":\n         return "[REDACTED] "\n     else:\n         return token.text\n
    Run Code Online (Sandbox Code Playgroud)\n
  2. \n
  3. 如果您正在提取实体以及其他跨度,例如doc.noun_chunks,您可能会遇到一些问题,例如:

    \n
     ValueError: [E102] Can\'t merge non-disjoint spans. \'Computing\' is already part of \n tokens to merge. If you want to find the longest non-overlapping spans, you can \n use the util.filter_spans helper:\n https://spacy.io/api/top-level#util.filter_spans\n
    Run Code Online (Sandbox Code Playgroud)\n

    因此,您可能还需要查看spacy.util.filter_spans:\n https://spacy.io/api/top-level#util.filter_spans

    \n
  4. \n
\n