在Spacy NER模型中进行评估

Mpi*_*ris 11 python spacy

我正在尝试评估使用spacy lib创建的训练有素的NER模型.通常对于这些问题,您可以使用f1分数(精确度和召回率之间的比率).我在文档中找不到训练有素的NER模型的精确度函数.

我不确定它是否正确,但我尝试使用以下方式(示例)并使用f1_scorefrom sklearn:

from sklearn.metrics import f1_score
import spacy
from spacy.gold import GoldParse


nlp = spacy.load("en") #load NER model
test_text = "my name is John" # text to test accuracy
doc_to_test = nlp(test_text) # transform the text to spacy doc format

# we create a golden doc where we know the tagged entity for the text to be tested
doc_gold_text= nlp.make_doc(test_text)
entity_offsets_of_gold_text = [(11, 15,"PERSON")]
gold = GoldParse(doc_gold_text, entities=entity_offsets_of_gold_text)

# bring the data in a format acceptable for sklearn f1 function
y_true = ["PERSON" if "PERSON" in x else 'O' for x in gold.ner]
y_predicted = [x.ent_type_ if x.ent_type_ !='' else 'O' for x in doc_to_test]
f1_score(y_true, y_predicted, average='macro')`[1]
> 1.0
Run Code Online (Sandbox Code Playgroud)

任何想法或见解都是有用的.

Mpi*_*ris 21

对于那些在以下链接中具有相同问题的人:

spaCy/scorer.py

你可以找到不同的指标,包括:fscore,召回和精确度.使用示例scorer:

import spacy
from spacy.gold import GoldParse
from spacy.scorer import Scorer

def evaluate(ner_model, examples):
    scorer = Scorer()
    for input_, annot in examples:
        doc_gold_text = ner_model.make_doc(input_)
        gold = GoldParse(doc_gold_text, entities=annot)
        pred_value = ner_model(input_)
        scorer.score(pred_value, gold)
    return scorer.scores

# example run

examples = [
    ('Who is Shaka Khan?',
     [(7, 17, 'PERSON')]),
    ('I like London and Berlin.',
     [(7, 13, 'LOC'), (18, 24, 'LOC')])
]

ner_model = spacy.load(ner_model_path) # for spaCy's pretrained use 'en_core_web_sm'
results = evaluate(ner_model, examples)
Run Code Online (Sandbox Code Playgroud)

input_文本在哪里(例如"我的名字是约翰")并且annot是注释(例如[(11,16,"PEOPLE")]

scorer.scores返回多得分.该示例取自github中的spaCy示例(链接不再起作用)

  • @EvanLalo 确保 `annot` 是一个可迭代的元组,而不是字典。我遇到了同样的问题。 (2认同)
  • 试试这个 `entities=annot['entities']` 而不是默认的 `entities=annot`。 (2认同)

小智 6

由于我遇到了同样的问题,我将在这里发布已接受的答案中显示的示例代码,但对于 spacy V3:

import spacy
from spacy.scorer import Scorer
from spacy.tokens import Doc
from spacy.training.example import Example

examples = [
    ('Who is Shaka Khan?',
     {(7, 17, 'PERSON')}),
    ('I like London and Berlin.',
     {(7, 13, 'LOC'), (18, 24, 'LOC')})
]

def evaluate(ner_model, examples):
    scorer = Scorer()
    example = []
    for input_, annot in examples:
        pred = ner_model(input_)
        print(pred,annot)
        temp = Example.from_dict(pred, dict.fromkeys(annot))
        example.append(temp)
    scores = scorer.score(example)
    return scores

ner_model = spacy.load('en_core_web_sm') # for spaCy's pretrained use 'en_core_web_sm'
results = evaluate(ner_model, examples)
print(results)
Run Code Online (Sandbox Code Playgroud)

由于 goldParse 等库已弃用,因此发生了重大更改

我相信关于指标的答案部分仍然有效