小编Aca*_*pha的帖子

在 pandas DataFrame 上使用 scikit-learn ColumnTransfer 后保留列名称

我正在尝试使用 SKLearn Pipelines 和 ColumnTransformer 编写预处理。然而,变压器返回一个数组(而不是数据帧)这一事实让我有点失望。我希望也能够在已处理的 df 上使用列名称。想象一下以下数据和管道:

import pandas as pd
import numpy as np
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer

df = pd.DataFrame(np.random.randn(6, 4), columns=list("ABCD"))
df["E"] = pd.Categorical(["test", "train", "test", "train", "test", "train"])
df["F"] = "foo"

num_columns = ['A', 'B', 'C']
num_transformer = Pipeline(
    steps=[
        ('imputer', SimpleImputer(strategy='median')),
        ('scaler', StandardScaler()),
    ]
)
cat_columns = ['E', 'F']
cat_transformer = Pipeline(
    steps = [
        ('imputer', SimpleImputer(strategy='most_frequent')), …
Run Code Online (Sandbox Code Playgroud)

python pipeline pandas scikit-learn

5
推荐指数
1
解决办法
1092
查看次数

标签 统计

pandas ×1

pipeline ×1

python ×1

scikit-learn ×1