Vij*_*jay 8 python machine-learning pandas scikit-learn
这是包含3个col和3行的数据集
名称组织部
Manie ABC2 FINANCE
Joyce ABC1 HR
Ami NSV2 HR
这是我的代码:
现在它很好,直到这里,我如何删除每个的第一个虚拟变量列?
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Importing the dataset
dataset = pd.read_csv('Data1.csv',encoding = "cp1252")
X = dataset.values
# Encoding categorical data
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_X_0 = LabelEncoder()
X[:, 0] = labelencoder_X_0.fit_transform(X[:, 0])
labelencoder_X_1 = LabelEncoder()
X[:, 1] = labelencoder_X_1.fit_transform(X[:, 1])
labelencoder_X_2 = LabelEncoder()
X[:, 2] = labelencoder_X_2.fit_transform(X[:, 2])
onehotencoder = OneHotEncoder(categorical_features = "all")
X = onehotencoder.fit_transform(X).toarray()
Run Code Online (Sandbox Code Playgroud)
Max*_*wer 12
import pandas as pd
df = pd.DataFrame({'name': ['Manie', 'Joyce', 'Ami'],
'Org': ['ABC2', 'ABC1', 'NSV2'],
'Dept': ['Finance', 'HR', 'HR']
})
df_2 = pd.get_dummies(df,drop_first=True)
Run Code Online (Sandbox Code Playgroud)
测试:
print(df_2)
Dept_HR Org_ABC2 Org_NSV2 name_Joyce name_Manie
0 0 1 0 0 1
1 1 0 0 1 0
2 1 0 1 0 0
Run Code Online (Sandbox Code Playgroud)
关于您的错误更新pd.get_dummies(X, columns =[1:]:
根据文档页面,该columns参数采用"列名称".所以下面的代码可以工作:
df_2 = pd.get_dummies(df, columns=['Org', 'Dept'], drop_first=True)
Run Code Online (Sandbox Code Playgroud)
输出:
name Org_ABC2 Org_NSV2 Dept_HR
0 Manie 1 0 0
1 Joyce 0 0 1
2 Ami 0 1 1
Run Code Online (Sandbox Code Playgroud)
如果你真的想要定位你的列,你可以这样做:
column_names_for_onehot = df.columns[1:]
df_2 = pd.get_dummies(df, columns=column_names_for_onehot, drop_first=True)
Run Code Online (Sandbox Code Playgroud)
我使用我自己的模板来做到这一点:
from sklearn.base import TransformerMixin
import pandas as pd
import numpy as np
class DataFrameEncoder(TransformerMixin):
def __init__(self):
"""Encode the data.
Columns of data type object are appended in the list. After
appending Each Column of type object are taken dummies and
successively removed and two Dataframes are concated again.
"""
def fit(self, X, y=None):
self.object_col = []
for col in X.columns:
if(X[col].dtype == np.dtype('O')):
self.object_col.append(col)
return self
def transform(self, X, y=None):
dummy_df = pd.get_dummies(X[self.object_col],drop_first=True)
X = X.drop(X[self.object_col],axis=1)
X = pd.concat([dummy_df,X],axis=1)
return X
Run Code Online (Sandbox Code Playgroud)
对于使用此代码,只需将此模板放在当前目录中,文件名让我们假设 CustomeEncoder.py 并输入您的代码:
from customEncoder import DataFrameEncoder
data = DataFrameEncoder().fit_transormer(data)
Run Code Online (Sandbox Code Playgroud)
并且所有对象类型数据被删除、编码、首先删除并连接在一起以提供最终所需的输出。
PS:这个模板的输入文件是 Pandas Dataframe。
| 归档时间: |
|
| 查看次数: |
17692 次 |
| 最近记录: |