在Ubuntu下全新安装Anaconda ...我在使用Scikit-Learn进行分类任务之前以各种方式预处理我的数据.
from sklearn import preprocessing
scaler = preprocessing.MinMaxScaler().fit(train)
train = scaler.transform(train)
test = scaler.transform(test)
Run Code Online (Sandbox Code Playgroud)
这一切都很好,但如果我有一个新的样本(温度低于),我想分类(因此我想以相同的方式预处理然后我得到
temp = [1,2,3,4,5,5,6,....................,7]
temp = scaler.transform(temp)
Run Code Online (Sandbox Code Playgroud)
然后我收到了弃用警告......
DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17
and will raise ValueError in 0.19. Reshape your data either using
X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1)
if it contains a single sample.
Run Code Online (Sandbox Code Playgroud)
所以问题是我应该如何重新缩放这样的单个样本?
我想一个替代方案(不是很好的)会......
temp = [temp, temp]
temp = scaler.transform(temp)
temp = temp[0]
Run Code Online (Sandbox Code Playgroud)
但我确信有更好的方法.
我试图使用XGBoosts分类器来分类一些二进制数据.当我做最简单的事情并只使用默认值(如下)
clf = xgb.XGBClassifier()
metLearn=CalibratedClassifierCV(clf, method='isotonic', cv=2)
metLearn.fit(train, trainTarget)
testPredictions = metLearn.predict(test)
Run Code Online (Sandbox Code Playgroud)
我得到了相当不错的分类结果.
我的下一步是尝试调整我的参数.从参数指南猜测... https://github.com/dmlc/xgboost/blob/master/doc/parameter.md 我想从默认开始并从那里工作......
# setup parameters for xgboost
param = {}
param['booster'] = 'gbtree'
param['objective'] = 'binary:logistic'
param["eval_metric"] = "error"
param['eta'] = 0.3
param['gamma'] = 0
param['max_depth'] = 6
param['min_child_weight']=1
param['max_delta_step'] = 0
param['subsample']= 1
param['colsample_bytree']=1
param['silent'] = 1
param['seed'] = 0
param['base_score'] = 0.5
clf = xgb.XGBClassifier(params)
metLearn=CalibratedClassifierCV(clf, method='isotonic', cv=2)
metLearn.fit(train, trainTarget)
testPredictions = metLearn.predict(test)
Run Code Online (Sandbox Code Playgroud)
结果是预测的一切都是条件而不是其他条件.
奇怪的是,如果我设置
params={}
Run Code Online (Sandbox Code Playgroud)
我期望给我相同的默认值,因为没有提供任何参数,我得到同样的事情发生
那么有谁知道XGBclassifier的默认值是什么?这样我才能开始调音?