Uri*_*ren 13 python scikit-learn deep-learning pytorch
我一直在wikigold.conll NER数据集上运行这个LSTM教程
training_data 包含序列和标签的元组列表,例如:
training_data = [
("They also have a song called \" wake up \"".split(), ["O", "O", "O", "O", "O", "O", "I-MISC", "I-MISC", "I-MISC", "I-MISC"]),
("Major General John C. Scheidt Jr.".split(), ["O", "O", "I-PER", "I-PER", "I-PER"])
]
Run Code Online (Sandbox Code Playgroud)
我写下了这个功能
def predict(indices):
"""Gets a list of indices of training_data, and returns a list of predicted lists of tags"""
for index in indicies:
inputs = prepare_sequence(training_data[index][0], word_to_ix)
tag_scores = model(inputs)
values, target = torch.max(tag_scores, 1)
yield target
Run Code Online (Sandbox Code Playgroud)
通过这种方式,我可以获得训练数据中特定指标的预测标签.
但是,如何评估所有训练数据的准确度分数.
准确性是,所有句子中正确分类的单词数量除以单词计数.
y_pred = list(predict([s for s, t in training_data]))
y_true = [t for s, t in training_data]
c=0
s=0
for i in range(len(training_data)):
n = len(y_true[i])
#super ugly and ineffiicient
s+=(sum(sum(list(y_true[i].view(-1, n) == y_pred[i].view(-1, n).data))))
c+=n
print ('Training accuracy:{a}'.format(a=float(s)/c))
Run Code Online (Sandbox Code Playgroud)
PS:我一直试图使用sklearn的accuracy_score失败
我会使用numpy为了不在纯 python 中迭代列表。
结果是一样的,但它运行得更快
def accuracy_score(y_true, y_pred):
y_pred = np.concatenate(tuple(y_pred))
y_true = np.concatenate(tuple([[t for t in y] for y in y_true])).reshape(y_pred.shape)
return (y_true == y_pred).sum() / float(len(y_true))
Run Code Online (Sandbox Code Playgroud)
这是如何使用它:
#original code:
y_pred = list(predict([s for s, t in training_data]))
y_true = [t for s, t in training_data]
#numpy accuracy score
print(accuracy_score(y_true, y_pred))
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
2854 次 |
| 最近记录: |