Pou*_*del 4 python pandas scikit-learn
我正在阅读有关列转换器的 scikitlearn 教程。给定的示例(https://scikit-learn.org/stable/modules/generated/sklearn.compose.make_column_selector.html#sklearn.compose.make_column_selector)有效,但是当我尝试仅选择几列时,它给了我错误.
import numpy as np
import pandas as pd
import seaborn as sns
from sklearn.compose import make_column_transformer
from sklearn.compose import make_column_selector
df = sns.load_dataset('tips')
mycols = ['tip','sex']
ct = make_column_transformer(make_column_selector(pattern=mycols)
ct.fit_transform(df)
Run Code Online (Sandbox Code Playgroud)
我只想要输出中的选择列。
注意
当然,我知道我可以做到df[mycols],我正在寻找 scikit 学习管道示例。
小智 12
ColumnTranformer()我可能有点晚了,但您也可以通过将变压器设置为“passthrough”来使用 sklearn 选择列remainder='drop':
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
pipe = Pipeline([
("selector", ColumnTransformer([
("selector", "passthrough", mycols)
], remainder="drop"))
])
Run Code Online (Sandbox Code Playgroud)
如果你不介意mlxtend,它有内置的变压器。
from mlxtend.feature_selection import ColumnSelector
pipe = ColumnSelector(mycols)
pipe.fit_transform(df)
Run Code Online (Sandbox Code Playgroud)
from mlxtend.feature_selection import ColumnSelector
pipe = ColumnSelector(mycols)
pipe.fit_transform(df)
Run Code Online (Sandbox Code Playgroud)
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.pipeline import Pipeline
class FeatureSelector(BaseEstimator, TransformerMixin):
def __init__(self, columns):
self.columns = columns
def fit(self, X, y=None):
return self
def transform(self, X, y=None):
return X[self.columns]
pipeline = Pipeline([('selector', FeatureSelector(columns=mycols))
])
pipeline.fit_transform(df)[:5]
Run Code Online (Sandbox Code Playgroud)