这是经典的培训形式。
TRAIN_DATA = [
("Who is Shaka Khan?", {"entities": [(7, 17, "PERSON")]}),
("I like London and Berlin.", {"entities": [(7, 13, "LOC"), (18, 24, "LOC")]}),
]
Run Code Online (Sandbox Code Playgroud)
我曾经用代码训练,但据我所知,用 CLI 训练方法训练效果更好。但是,我的格式是这样的。
我已经找到了用于这种类型转换的代码片段,但它们中的每一个都在执行spacy.load('en')而不是空白 - 这让我想到,他们是在训练现有模型而不是空白吗?
这个块看起来很简单:
import spacy
from spacy.gold import docs_to_json
import srsly
nlp = spacy.load('en', disable=["ner"]) # as you see it's loading 'en' which I don't have
TRAIN_DATA = #data from above
docs = []
for text, annot in TRAIN_DATA:
doc = nlp(text)
doc.ents = [doc.char_span(start_idx, end_idx, label=label) for start_idx, …Run Code Online (Sandbox Code Playgroud)