san*_*oku 9 python scikit-learn
我正在尝试在sklearn管道中第一次使用featureunion来组合数字(2列)和文本特征(1列)以进行多类分类.
from sklearn.preprocessing import FunctionTransformer
from sklearn.pipeline import Pipeline
from sklearn.multiclass import OneVsRestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import FeatureUnion
get_text_data = FunctionTransformer(lambda x: x['text'], validate=False)
get_numeric_data = FunctionTransformer(lambda x: x[['num1','num2']], validate=False)
process_and_join_features = FeatureUnion(
[
('numeric_features', Pipeline([
('selector', get_numeric_data),
('clf', OneVsRestClassifier(LogisticRegression()))
])),
('text_features', Pipeline([
('selector', get_text_data),
('vec', CountVectorizer()),
('clf', OneVsRestClassifier(LogisticRegression()))
]))
]
)
Run Code Online (Sandbox Code Playgroud)
在此代码中,'text'是文本列,'num1','num2'是2个数字列.
错误消息是
TypeError: All estimators should implement fit and transform. 'Pipeline(memory=None,
steps=[('selector', FunctionTransformer(accept_sparse=False,
func=<function <lambda> at 0x7fefa8efd840>, inv_kw_args=None,
inverse_func=None, kw_args=None, pass_y='deprecated',
validate=False)), ('clf', OneVsRestClassifier(estimator=LogisticRegression(C=1.0, class_weigh...=None, solver='liblinear', tol=0.0001,
verbose=0, warm_start=False),
n_jobs=1))])' (type <class 'sklearn.pipeline.Pipeline'>) doesn't
Run Code Online (Sandbox Code Playgroud)
我错过了任何一步?
A FeatureUnion
应该用作管道中的一个步骤,而不是管道周围.你所得到的错误是因为你有一个分类不是最后一步-工会试图调用fit
和transform
所有变压器和分类没有一个transform
方法.
简单地返工以使用分类器作为最后一步的外部管道:
process_and_join_features = Pipeline([
('features', FeatureUnion([
('numeric_features', Pipeline([
('selector', get_numeric_data)
])),
('text_features', Pipeline([
('selector', get_text_data),
('vec', CountVectorizer())
]))
])),
('clf', OneVsRestClassifier(LogisticRegression()))
])
Run Code Online (Sandbox Code Playgroud)
还可以在这里看到scikit-learn网站做这类事情的一个很好的例子.
虽然我相信@Ken Syme正确地发现了问题并为你打算做什么提供了解决方案.但是,如果您确实打算将分类器的输出用作更高级别模型的功能,请查看此博客.
使用Zac的ModelTransformer,您可以按如下方式管道:
class ModelTransformer(TransformerMixin):
def __init__(self, model):
self.model = model
def fit(self, *args, **kwargs):
self.model.fit(*args, **kwargs)
return self
def transform(self, X, **transform_params):
return DataFrame(self.model.predict(X))
process_and_join_features = FeatureUnion(
[
('numeric_features', Pipeline([
('selector', get_numeric_data),
('clf', ModelTransformer(OneVsRestClassifier(LogisticRegression())))
])),
('text_features', Pipeline([
('selector', get_text_data),
('vec', CountVectorizer()),
('clf', ModelTransformer(OneVsRestClassifier(LogisticRegression())))
]))
]
)
Run Code Online (Sandbox Code Playgroud)
根据具体的后续步骤,您仍可能需要将FeatureUnion包装在管道中(例如,使用快捷方式make_pipeline).
归档时间: |
|
查看次数: |
3109 次 |
最近记录: |