Mak*_*iii 2 python scikit-learn categorical-data
from sklearn.preprocessing import LabelEncoder as LE, OneHotEncoder as OHE
import numpy as np
a = np.array([[0,1,100],[1,2,200],[2,3,400]])
oh = OHE(categorical_features=[0,1])
a = oh.fit_transform(a).toarray()
Run Code Online (Sandbox Code Playgroud)
Let's assume first and second column are categorical data. This code does one hot encoding, but for the regression problem, I would like to remove first column after encoding categorical data. In this example, there are two and I could do it manually. But what if you have many categorical features, how would you solve this problem?
为此,我使用了包装器,该包装器也可以在管道中使用:
class DummyEncoder(BaseEstimator, TransformerMixin):
def __init__(self, n_values='auto'):
self.n_values = n_values
def transform(self, X):
ohe = OneHotEncoder(sparse=False, n_values=self.n_values)
return ohe.fit_transform(X)[:,:-1]
def fit(self, X, y=None, **fit_params):
return self
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
2753 次 |
| 最近记录: |