我尝试使用 spacy 从文本中提取所需的自定义实体。
import spacy
from spacy_lookup import Entity
data = {0:["count"],1:["unique count","unique"]}
def processText(text):
nlp = spacy.blank('en')
for i,arr in data.items():
fLabel = "test:"+str(i)
fEntitty = Entity(keywords_list=list(set(arr)),label=fLabel)
fEntitty.name = fLabel
nlp.add_pipe(fEntitty)
match_doc = nlp(text)
print(match_doc.ents)
processText("unique count of city")
Run Code Online (Sandbox Code Playgroud)
但上面的代码抛出了类似的错误
ValueError: [E103] Trying to set conflicting doc.ents: '(1, 2, 'test:0')' and '(0, 2, 'test:1')'. A token can only be part of one entity, so make sure the entities you're setting don't overlap.
Run Code Online (Sandbox Code Playgroud)
不仅是这个案例,还有同样的人名问题,比如 Karthik vs Karthik reddy、Jon vs Jon Allen 任何人都可以帮我解决这个问题。
提前致谢!!