是否有可能获得 Spacy 命名实体识别的置信度分数

vri*_*nda 5 python nlp pandas spacy ner

我需要获得 Spacy NER 所做预测的置信度分数。

CSV 文件

Text,Amount & Nature,Percent of Class
"T. Rowe Price Associates, Inc.","28,223,360 (1)",8.7% (1)
100 E. Pratt Street,Not Listed,Not Listed
"Baltimore, MD 21202",Not Listed,Not Listed
"BlackRock, Inc.","21,871,854 (2)",6.8% (2)
55 East 52nd Street,Not Listed,Not Listed
"New York, NY 10022",Not Listed,Not Listed
The Vanguard Group,"21,380,085 (3)",6.64% (3)
100 Vanguard Blvd.,Not Listed,Not Listed
"Malvern, PA 19355",Not Listed,Not Listed
FMR LLC,"20,784,414 (4)",6.459% (4)
245 Summer Street,Not Listed,Not Listed
"Boston, MA 02210",Not Listed,Not Listed
Run Code Online (Sandbox Code Playgroud)

代码

import pandas as pd
import spacy
with open('/path/table.csv') as csvfile:
    reader1 = csv.DictReader(csvfile)
    data1 =[["Text","Amount & Nature","Prediction"]]
    for row in reader1:
        AmountNature = row["Amount & Nature"]
        nlp = spacy.load('en_core_web_sm') 
        doc1 = nlp(row["Text"])

        for ent in doc1.ents:
            #output = [ent.text, ent.start_char, ent.end_char, ent.label_]
            label1 = ent.label_
            text1 = ent.text
        data1.append([str(doc1),AmountNature,label1])
my_df1 = pd.DataFrame(data1)
my_df1.columns = my_df1.iloc[0]
my_df1 = my_df1.drop(my_df1.index[[0]])
my_df1.to_csv('/path/output.csv', index=False, header=["Text","Amount & Nature","Prediction"])
Run Code Online (Sandbox Code Playgroud)

输出 CSV

Text,Amount & Nature,Prediction
"T. Rowe Price Associates, Inc.","28,223,360 (1)",ORG
100 E. Pratt Street,Not Listed,FAC
"Baltimore, MD 21202",Not Listed,CARDINAL
"BlackRock, Inc.","21,871,854 (2)",ORG
55 East 52nd Street,Not Listed,LOC
"New York, NY 10022",Not Listed,DATE
The Vanguard Group,"21,380,085 (3)",ORG
100 Vanguard Blvd.,Not Listed,FAC
"Malvern, PA 19355",Not Listed,DATE
FMR LLC,"20,784,414 (4)",ORG
245 Summer Street,Not Listed,CARDINAL
"Boston, MA 02210",Not Listed,GPE
Run Code Online (Sandbox Code Playgroud)

在上面的输出中,是否有可能在 Spacy NER 预测上获得 Confident Score。如果是,我该如何实现?

有人可以帮我吗?

Nav*_*nis 5

不,不可能在 Spacy 中获得模型的置信度得分(不幸的是)。正如本期#881中所提到的,如果使用 ,则可以获得分数,get_beam_parses尽管它似乎存在线程中提到的一系列问题

虽然使用 F1 分数有利于整体评估,但我更希望 Spacy 能够为其预测提供单独的置信度分数,而目前它还没有提供。


Ic3*_*r0g 1

要么获取完整注释的数据集,要么自己手动注释(因为您有 CSV 文件,这可能是您的首选)。这样你就可以区分真实情况和 Spacy 的预测。基于此,您可以计算混淆矩阵。我建议使用 F1 分数作为置信度的衡量标准。

这里 一些 很棒的 链接,讨论各种公开可用的数据集和注释方法(包括 CRF)。