如何在scikit-learn中创建/自定义自己的得分手功能?

dan*_*014 30 python scikit-learn

我使用支持向量回归作为GridSearchCV中的估算器.但我想改变错误函数:而不是使用默认值(R平方:确定系数),我想定义我自己的自定义错误函数.

我尝试用一​​个make_scorer,但它没有用.

我阅读了文档并发现可以创建自定义估算器,但我不需要重新制作整个估算器 - 只有错误/评分函数.

我认为我可以通过将可调用者定义为得分者来实现,就像在文档中所说的那样.

但我不知道如何使用估算器:在我的情况下SVR.我是否必须切换到分类器(例如SVC)?我将如何使用它?

我的自定义错误功能如下:

def my_custom_loss_func(X_train_scaled, Y_train_scaled):
    error, M = 0, 0
    for i in range(0, len(Y_train_scaled)):
        z = (Y_train_scaled[i] - M)
        if X_train_scaled[i] > M and Y_train_scaled[i] > M and (X_train_scaled[i] - Y_train_scaled[i]) > 0:
            error_i = (abs(Y_train_scaled[i] - X_train_scaled[i]))**(2*np.exp(z))
        if X_train_scaled[i] > M and Y_train_scaled[i] > M and (X_train_scaled[i] - Y_train_scaled[i]) < 0:
            error_i = -(abs((Y_train_scaled[i] - X_train_scaled[i]))**(2*np.exp(z)))
        if X_train_scaled[i] > M and Y_train_scaled[i] < M:
            error_i = -(abs(Y_train_scaled[i] - X_train_scaled[i]))**(2*np.exp(-z))
    error += error_i
    return error
Run Code Online (Sandbox Code Playgroud)

变量M不为null/zero.为简单起见,我只是把它设置为零.

有人能够显示这个自定义评分函数的示例应用程序吗?谢谢你的帮助!

Jam*_*ull 29

如您所见,这是通过使用make_scorer(docs)完成的.

from sklearn.grid_search import GridSearchCV
from sklearn.metrics.scorer import make_scorer
from sklearn.svm import SVR

import numpy as np

rng = np.random.RandomState(1)

def my_custom_loss_func(X_train_scaled, Y_train_scaled):
    error, M = 0, 0
    for i in range(0, len(Y_train_scaled)):
        z = (Y_train_scaled[i] - M)
        if X_train_scaled[i] > M and Y_train_scaled[i] > M and (X_train_scaled[i] - Y_train_scaled[i]) > 0:
            error_i = (abs(Y_train_scaled[i] - X_train_scaled[i]))**(2*np.exp(z))
        if X_train_scaled[i] > M and Y_train_scaled[i] > M and (X_train_scaled[i] - Y_train_scaled[i]) < 0:
            error_i = -(abs((Y_train_scaled[i] - X_train_scaled[i]))**(2*np.exp(z)))
        if X_train_scaled[i] > M and Y_train_scaled[i] < M:
            error_i = -(abs(Y_train_scaled[i] - X_train_scaled[i]))**(2*np.exp(-z))
    error += error_i
    return error

# Generate sample data
X = 5 * rng.rand(10000, 1)
y = np.sin(X).ravel()

# Add noise to targets
y[::5] += 3 * (0.5 - rng.rand(X.shape[0]/5))

train_size = 100

my_scorer = make_scorer(my_custom_loss_func, greater_is_better=True)

svr = GridSearchCV(SVR(kernel='rbf', gamma=0.1),
                   scoring=my_scorer,
                   cv=5,
                   param_grid={"C": [1e0, 1e1, 1e2, 1e3],
                               "gamma": np.logspace(-2, 2, 5)})

svr.fit(X[:train_size], y[:train_size])

print svr.best_params_
print svr.score(X[train_size:], y[train_size:])
Run Code Online (Sandbox Code Playgroud)

  • 别担心,你的英语很好。这个例子对你有用吗?如果没有,你能告诉我问题出在哪里吗? (2认同)
  • 我试过这个例子。我不知道,但是……不,它有效!我想我在 dtype 中有一个错误(在数组和 python 之间;也许我有“n_job=-1”)。我真的不知道,但我很高兴,因为它现在有效!非常感谢! (2认同)

ali*_*dry 27

Jamie有一个充实的例子,但是这里有一个使用make_scorer直接来自scikit-learn 文档的例子:

import numpy as np
def my_custom_loss_func(ground_truth, predictions):
    diff = np.abs(ground_truth - predictions).max()
    return np.log(1 + diff)

# loss_func will negate the return value of my_custom_loss_func,
#  which will be np.log(2), 0.693, given the values for ground_truth
#  and predictions defined below.
loss  = make_scorer(my_custom_loss_func, greater_is_better=False)
score = make_scorer(my_custom_loss_func, greater_is_better=True)
ground_truth = [[1, 1]]
predictions  = [0, 1]
from sklearn.dummy import DummyClassifier
clf = DummyClassifier(strategy='most_frequent', random_state=0)
clf = clf.fit(ground_truth, predictions)
loss(clf,ground_truth, predictions) 

score(clf,ground_truth, predictions)
Run Code Online (Sandbox Code Playgroud)

通过定义自定义记分器时sklearn.metrics.make_scorer,惯例是自定义函数以_score返回最大化的值结束.对于以_loss或结尾的得分者_error,返回的值最小化.您可以通过在greater_is_better里面设置参数来使用此功能make_scorer.也就是说,此参数True适用于较高值较高的记分员,以及False较低值较好的记分员.GridSearchCV然后可以在适当的方向上进行优化.

然后,您可以将您的功能转换为得分手,如下所示:

from sklearn.metrics.scorer import make_scorer

def custom_loss_func(X_train_scaled, Y_train_scaled):
    error, M = 0, 0
    for i in range(0, len(Y_train_scaled)):
        z = (Y_train_scaled[i] - M)
        if X_train_scaled[i] > M and Y_train_scaled[i] > M and (X_train_scaled[i] - Y_train_scaled[i]) > 0:
            error_i = (abs(Y_train_scaled[i] - X_train_scaled[i]))**(2*np.exp(z))
        if X_train_scaled[i] > M and Y_train_scaled[i] > M and (X_train_scaled[i] - Y_train_scaled[i]) < 0:
            error_i = -(abs((Y_train_scaled[i] - X_train_scaled[i]))**(2*np.exp(z)))
        if X_train_scaled[i] > M and Y_train_scaled[i] < M:
            error_i = -(abs(Y_train_scaled[i] - X_train_scaled[i]))**(2*np.exp(-z))
    error += error_i
    return error


custom_scorer = make_scorer(custom_loss_func, greater_is_better=True)
Run Code Online (Sandbox Code Playgroud)

然后像其他任何评分函数一样custom_scorer进入GridSearchCV:clf = GridSearchCV(scoring=custom_scorer).