解释 DecisionTreeRegressor 分数？

Question

解释 DecisionTreeRegressor 分数？

was*_*abi 5 python machine-learning decision-tree scikit-learn supervised-learning

我正在尝试评估功能的相关性并且我正在使用DecisionTreeRegressor()

相关部分代码如下：

# TODO: Make a copy of the DataFrame, using the 'drop' function to drop the given feature
new_data = data.drop(['Frozen'], axis = 1)

# TODO: Split the data into training and testing sets(0.25) using the given feature as the target
# TODO: Set a random state.

from sklearn.model_selection import train_test_split


X_train, X_test, y_train, y_test = train_test_split(new_data, data['Frozen'], test_size = 0.25, random_state = 1)

# TODO: Create a decision tree regressor and fit it to the training set

from sklearn.tree import DecisionTreeRegressor


regressor = DecisionTreeRegressor(random_state=1)
regressor.fit(X_train, y_train)

# TODO: Report the score of the prediction using the testing set

from sklearn.model_selection import cross_val_score


#score = cross_val_score(regressor, X_test, y_test)
score = regressor.score(X_test, y_test)

print score  # python 2.x

Run Code Online (Sandbox Code Playgroud)

当我运行该print函数时，它返回给定的分数：

-0.649574327334

您可以在下面和下面找到分数函数的实现和一些解释：

返回预测的确定系数R^2 。...最好的可能分数是 1.0，它可以是负值（因为模型可以任意变差）。

我还无法掌握整个概念，所以这个解释对我来说没有多大帮助。例如，我无法理解为什么分数可能是负数以及它到底表示什么（如果某个东西是平方的，我希望它只能是正数）。

这个分数表明什么以及为什么会是负数？

如果您知道任何文章（对于初学者），它也可能会有所帮助！

Answer 1

小智 5

R^2如果模型对数据的拟合程度比水平线差，则其定义 ( https://en.wikipedia.org/wiki/Coefficient_of_metry )可能为负值。基本上

R^2 = 1 - SS_res/SS_tot

Run Code Online (Sandbox Code Playgroud)

和SS_res和SS_tot始终为正值。如果SS_res >> SS_tot，你有负面的R^2。也看看这个答案：https://stats.stackexchange.com/questions/12900/when-is-r-squared-negative

Answer 2

chr*_*821 0

cross_val_score执行其中的文章执行DecisionTreeRegressor。您可以查看 scikitlearn DecisionTreeRegressor的文档的文档。基本上，您看到的分数是 R^2，或 (1-u/v)。U 是预测的残差平方和，v 是总平方和（样本平方和）。

当你做出非常糟糕的预测时，u/v 可以是任意大的，而考虑到 u 和 v 是残差平方和（>=0），它只能小到零

归档时间：	8 年，4 月前
查看次数：	17312 次
最近记录：	7 年，11 月前