我想用以下(text, label)对训练TextCategorizer模型.
标签颜色:
标签动物:
我正在复制TextCategorizer文档中的示例代码.
textcat = TextCategorizer(nlp.vocab)
losses = {}
optimizer = nlp.begin_training()
textcat.update([doc1, doc2], [gold1, gold2], losses=losses, sgd=optimizer)
Run Code Online (Sandbox Code Playgroud)
doc变量可能只是nlp("The door is brown.")等等.什么应该gold1和gold2?我猜他们应该是GoldParse对象,但我不知道你如何表示那些文本分类信息.
根据这个示例train_textcat.py,它应该类似于{'cats': {'ANIMAL': 0, 'COLOR': 1}}您想要训练多标签模型.此外,如果您只有两个类,则可以使用{'cats': {'ANIMAL': 1}}标签ANIMAL和{'cats': {'ANIMAL': 0}}标签COLOR.
您可以使用以下最小工作示例进行单类文本分类;
import spacy
nlp = spacy.load('en')
train_data = [
(u"That was very bad", {"cats": {"POSITIVE": 0}}),
(u"it is so bad", {"cats": {"POSITIVE": 0}}),
(u"so terrible", {"cats": {"POSITIVE": 0}}),
(u"I like it", {"cats": {"POSITIVE": 1}}),
(u"It is very good.", {"cats": {"POSITIVE": 1}}),
(u"That was great!", {"cats": {"POSITIVE": 1}}),
]
textcat = nlp.create_pipe('textcat')
nlp.add_pipe(textcat, last=True)
textcat.add_label('POSITIVE')
optimizer = nlp.begin_training()
for itn in range(100):
for doc, gold in train_data:
nlp.update([doc], [gold], sgd=optimizer)
doc = nlp(u'It is good.')
print(doc.cats)
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
1267 次 |
| 最近记录: |