我无法一次将不同的转换器应用于不同类型(文本与数字)的列,并将这些转换器连接到一个转换器中以备后用。
我尝试按照Column Transformer with Mixed Types文档中的步骤进行操作,该文档解释了如何对分类和数字数据的混合执行此操作,但它似乎不适用于文本数据。
您如何创建一个可存储的转换器,该转换器遵循不同的文本和数字数据管道?
# imports
import numpy as np
from sklearn.compose import ColumnTransformer
from sklearn.datasets import fetch_openml
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.impute import SimpleImputer
from sklearn.model_selection import train_test_split
from sklearn.pipeline import FeatureUnion, Pipeline
from sklearn.preprocessing import StandardScaler
np.random.seed(0)
# download Titanic data
X, y = fetch_openml("titanic", version=1, as_frame=True, return_X_y=True)
# data preparation
numeric_features = ['age', 'fare']
text_features = ['name', 'cabin', 'home.dest']
X.fillna({text_col: '' for text_col in text_features}, inplace=True) …
Run Code Online (Sandbox Code Playgroud)