Removing columns with sklearn's OneHotEncoder

Mak*_*iii 2 python scikit-learn categorical-data

from sklearn.preprocessing import LabelEncoder as LE, OneHotEncoder as OHE
import numpy as np

a = np.array([[0,1,100],[1,2,200],[2,3,400]])


oh = OHE(categorical_features=[0,1])
a = oh.fit_transform(a).toarray()
Run Code Online (Sandbox Code Playgroud)

Let's assume first and second column are categorical data. This code does one hot encoding, but for the regression problem, I would like to remove first column after encoding categorical data. In this example, there are two and I could do it manually. But what if you have many categorical features, how would you solve this problem?

Mar*_* V. 5

为此,我使用了包装器,该包装器也可以在管道中使用:

class DummyEncoder(BaseEstimator, TransformerMixin):

    def __init__(self, n_values='auto'):
        self.n_values = n_values

    def transform(self, X):
        ohe = OneHotEncoder(sparse=False, n_values=self.n_values)
        return ohe.fit_transform(X)[:,:-1]

    def fit(self, X, y=None, **fit_params):
        return self
Run Code Online (Sandbox Code Playgroud)