toy*_*toy 14 python dataframe pandas random-forest scikit-learn
我只是想做一个简单的RandomForestRegressor示例.但在测试准确性时,我得到了这个错误
Run Code Online (Sandbox Code Playgroud)/Users/noppanit/anaconda/lib/python2.7/site-packages/sklearn/metrics/classification.pyc
in accuracy_score(y_true,y_pred,normalize,sample_weight)177 178#计算每种可能表示的准确性 - > 179 y_type,y_true,y_pred = _check_targets(y_true,y_pred)180如果y_type.startswith('multilabel'):181 differing_labels = count_nonzero(y_true - y_pred,axis = 1)
Run Code Online (Sandbox Code Playgroud)/Users/noppanit/anaconda/lib/python2.7/site-packages/sklearn/metrics/classification.pyc
in _check_targets(y_true,y_pred)90 if(y_type不在["binary","multiclass","multilabel-indicator",91"multilabel-sequences"]):---> 92引发ValueError("{0}是不支持".format(y_type)"93 94如果["binary","multiclass"]中的y_type:
Run Code Online (Sandbox Code Playgroud)ValueError: continuous is not supported
这是数据的样本.我无法显示真实数据.
target, func_1, func_2, func_2, ... func_200
float, float, float, float, ... float
Run Code Online (Sandbox Code Playgroud)
这是我的代码.
import pandas as pd
import numpy as np
from sklearn.preprocessing import Imputer
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor, ExtraTreesRegressor, GradientBoostingRegressor
from sklearn.cross_validation import train_test_split
from sklearn.metrics import accuracy_score
from sklearn import tree
train = pd.read_csv('data.txt', sep='\t')
labels = train.target
train.drop('target', axis=1, inplace=True)
cat = ['cat']
train_cat = pd.get_dummies(train[cat])
train.drop(train[cat], axis=1, inplace=True)
train = np.hstack((train, train_cat))
imp = Imputer(missing_values='NaN', strategy='mean', axis=0)
imp.fit(train)
train = imp.transform(train)
x_train, x_test, y_train, y_test = train_test_split(train, labels.values, test_size = 0.2)
clf = RandomForestRegressor(n_estimators=10)
clf.fit(x_train, y_train)
y_pred = clf.predict(x_test)
accuracy_score(y_test, y_pred) # This is where I get the error.
Run Code Online (Sandbox Code Playgroud)
Ibr*_*iev 34
这是因为accuracy_score仅用于分类任务.对于回归,您应该使用不同的东西,例如:
clf.score(X_test, y_test)
Run Code Online (Sandbox Code Playgroud)
在X_test是样本的情况下,y_test是对应的地面实况值.它将计算内部的预测.
由于您正在执行回归任务,因此您应该使用度量R 平方 (确定系数)而不是 准确度分数(准确度分数用于分类问题)。
R平方可以通过调用RandomForestRegressor提供的score函数来计算,例如:
rfr.score(X_test,Y_test)
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
29291 次 |
最近记录: |