MYj*_*Yjx 2 python pandas scikit-learn sklearn-pandas
我的问题是我的pandas数据框中有这么多列,我正在尝试使用sklearn-pandas库中的dataframe mapper应用sklearn预处理,例如
mapper= DataFrameMapper([
('gender',sklearn.preprocessing.LabelBinarizer()),
('gradelevel',sklearn.preprocessing.LabelEncoder()),
('subject',sklearn.preprocessing.LabelEncoder()),
('districtid',sklearn.preprocessing.LabelEncoder()),
('sbmRate',sklearn.preprocessing.StandardScaler()),
('pRate',sklearn.preprocessing.StandardScaler()),
('assn1',sklearn.preprocessing.StandardScaler()),
('assn2',sklearn.preprocessing.StandardScaler()),
('assn3',sklearn.preprocessing.StandardScaler()),
('assn4',sklearn.preprocessing.StandardScaler()),
('assn5',sklearn.preprocessing.StandardScaler()),
('attd1',sklearn.preprocessing.StandardScaler()),
('attd2',sklearn.preprocessing.StandardScaler()),
('attd3',sklearn.preprocessing.StandardScaler()),
('attd4',sklearn.preprocessing.StandardScaler()),
('attd5',sklearn.preprocessing.StandardScaler()),
('sbm1',sklearn.preprocessing.StandardScaler()),
('sbm2',sklearn.preprocessing.StandardScaler()),
('sbm3',sklearn.preprocessing.StandardScaler()),
('sbm4',sklearn.preprocessing.StandardScaler()),
('sbm5',sklearn.preprocessing.StandardScaler())
])
Run Code Online (Sandbox Code Playgroud)
我只是想知道是否有另一种更简洁的方法让我一次预处理许多变量而不用明确地写出来.
我发现有点烦人的另一件事是当我将所有pandas数据帧转换为sklearn可以使用的数组时,它们将丢失列名特征,这使得选择非常困难.有人知道如何在将pandas数据帧更改为np数组时保留列名作为键吗?
非常感谢!
from sklearn.preprocessing import LabelBinarizer, LabelEncoder, StandardScaler
from sklearn_pandas import DataFrameMapper
encoders = ['gradelevel', 'subject', 'districtid']
scalars = ['sbmRate', 'pRate', 'assn1', 'assn2', 'assn3', 'assn4', 'assn5', 'attd1', 'attd2', 'attd3', 'attd4', 'attd5', 'sbm1', 'sbm2', 'sbm3', 'sbm4', 'sbm5']
mapper = DataFrameMapper(
[('gender', LabelBinarizer())] +
[(encoder, LabelEncoder()) for encoder in encoders] +
[(scalar, StandardScaler()) for scalar in scalars]
)
Run Code Online (Sandbox Code Playgroud)
如果你这么做,你甚至可以编写自己的函数:
mapper = data_frame_mapper(binarizers=['gender'],
encoders=['gradelevel', 'subject', 'districtid'],
scalars=['sbmRate', 'pRate', 'assn1', 'assn2', 'assn3', 'assn4', 'assn5', 'attd1', 'attd2', 'attd3', 'attd4', 'attd5', 'sbm1', 'sbm2', 'sbm3', 'sbm4', 'sbm5'])
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
2899 次 |
| 最近记录: |