Bob*_*Bob 3 python machine-learning scikit-learn
我想将额外的数据传递给 scikit-learn 中的转换器:
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.ensemble import RandomForestClassifier
from sklearn.pipeline import Pipeline
import numpy as np
from sklearn.model_selection import GridSearchCV
class myTransformer(BaseEstimator, TransformerMixin):
def __init__(self, my_np_array):
self.data = my_np_array
print self.data
def transform(self, X):
return X
def fit(self, X, y=None):
return self
data = np.random.rand(20,20)
data2 = np.random.rand(6,6)
y = np.array([1, 2, 3, 1, 2, 3, 1, 1, 1, 2, 2, 2, 2, 2, 2, 1, 3, 3, 3, 3])
pipe = Pipeline(steps=[('myt', myTransformer(data2)), ('randforest', RandomForestClassifier())])
params = {"randforest__n_estimators": [100, 1000]}
estimators = GridSearchCV(pipe, param_grid=params, verbose=True)
estimators.fit(data, y)
Run Code Online (Sandbox Code Playgroud)
但是,当在 scikit-learn 管道中使用时,它似乎消失了
我None
从 init 方法中的打印中获取。我如何解决它?
发生这种情况是因为 sklearn 以非常具体的方式处理估算器。一般来说,它会为网格搜索之类的事情创建一个类的新实例,并将参数传递给构造函数。发生这种情况是因为 sklearn 有自己的克隆操作(在 base.py 中定义),它接受您的估算器类,获取参数(由 返回get_params
)并将其传递给您的类的构造函数
klass = estimator.__class__
new_object_params = estimator.get_params(deep=False)
for name, param in six.iteritems(new_object_params):
new_object_params[name] = clone(param, safe=False)
new_object = klass(**new_object_params)
Run Code Online (Sandbox Code Playgroud)
为了支持您的对象必须覆盖get_params(deep=False)
方法,该方法应返回字典,该字典将传递给构造函数
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.pipeline import Pipeline
from sklearn.model_selection import GridSearchCV
class myTransformer(BaseEstimator, TransformerMixin):
def __init__(self, my_np_array):
self.data = my_np_array
print self.data
def transform(self, X):
return X
def fit(self, X, y=None):
return self
def get_params(self, deep=False):
return {'my_np_array': self.data}
Run Code Online (Sandbox Code Playgroud)
将按预期工作。
归档时间: |
|
查看次数: |
727 次 |
最近记录: |