如何计算Keras的F1 Macro?

Ary*_*ema 27 keras

我试图在删除之前使用Keras提供的代码.这是代码:

def precision(y_true, y_pred):
    true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
    predicted_positives = K.sum(K.round(K.clip(y_pred, 0, 1)))
    precision = true_positives / (predicted_positives + K.epsilon())
    return precision

def recall(y_true, y_pred):
    true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
    possible_positives = K.sum(K.round(K.clip(y_true, 0, 1)))
    recall = true_positives / (possible_positives + K.epsilon())
    return recall

def fbeta_score(y_true, y_pred, beta=1):
    if beta < 0:
        raise ValueError('The lowest choosable beta is zero (only precision).')

    # If there are no true positives, fix the F score at 0 like sklearn.
    if K.sum(K.round(K.clip(y_true, 0, 1))) == 0:
        return 0

    p = precision(y_true, y_pred)
    r = recall(y_true, y_pred)
    bb = beta ** 2
    fbeta_score = (1 + bb) * (p * r) / (bb * p + r + K.epsilon())
    return fbeta_score

def fmeasure(y_true, y_pred):
    return fbeta_score(y_true, y_pred, beta=1)
Run Code Online (Sandbox Code Playgroud)

从我所看到的(我是一个业余爱好者),似乎他们使用正确的公式.但是,当我尝试将其用作训练过程中的指标时,我得到了val_accuracy,val_precision,val_recall和val_fmeasure的完全相等的输出.我相信即使公式正确也可能发生,但我相信这不太可能.对此问题的任何解释?谢谢

Pad*_*ddy 56

因为Keras 2.0指标f1,精度和召回已被删除.解决方案是使用自定义度量函数:

from keras import backend as K

def f1(y_true, y_pred):
    def recall(y_true, y_pred):
        """Recall metric.

        Only computes a batch-wise average of recall.

        Computes the recall, a metric for multi-label classification of
        how many relevant items are selected.
        """
        true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
        possible_positives = K.sum(K.round(K.clip(y_true, 0, 1)))
        recall = true_positives / (possible_positives + K.epsilon())
        return recall

    def precision(y_true, y_pred):
        """Precision metric.

        Only computes a batch-wise average of precision.

        Computes the precision, a metric for multi-label classification of
        how many selected items are relevant.
        """
        true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
        predicted_positives = K.sum(K.round(K.clip(y_pred, 0, 1)))
        precision = true_positives / (predicted_positives + K.epsilon())
        return precision
    precision = precision(y_true, y_pred)
    recall = recall(y_true, y_pred)
    return 2*((precision*recall)/(precision+recall+K.epsilon()))


model.compile(loss='binary_crossentropy',
          optimizer= "adam",
          metrics=[f1])
Run Code Online (Sandbox Code Playgroud)

这个函数的返回行

return 2*((precision*recall)/(precision+recall+K.epsilon()))
Run Code Online (Sandbox Code Playgroud)

通过添加常数epsilon来修改,以避免除以0.因此不会计算NaN.


Die*_*she 13

使用Keras度量函数不是计算F1或AUC或诸如此类的正确方法。

这样做的原因是,在验证的每个批处理步骤中都会调用度量函数。这样,Keras系统将计算批处理结果的平均值。那不是正确的F1分数。

这就是为什么F1分数从keras的度量函数中删除的原因。看这里:

正确的方法是使用自定义回调函数,如下所示: