Keep other variables when executing get_dummies in Pandas

Ber*_*ans 4 python-2.7 pandas dummy-variable

I have a DataFrame with an ID variable and another categorical variable. I want to create dummy variables out of the categorical variable with get_dummies.

dum = pd.get_dummies(df)
Run Code Online (Sandbox Code Playgroud)

However, this makes the ID variable disappear. And I need this ID variable later on to merge to other data sets.

Is there a way to keep other variables. In the documentation of get_dummies I could not find anything. Thanks!

Tom*_*Tom 7

您还可以在执行 get_dummies 之前将原始列复制到新列中。例如,

df['dum_orig'] = df['dum']
df = pd.get_dummies(df, columns=['dum'])
Run Code Online (Sandbox Code Playgroud)


Ber*_*ans 5

我找到了答案。您可以将虚拟数据集连接到原始数据集,如下所示。只要您在此期间不重新排序数据即可。

df = pd.concat([df, dum], axis=1) 
Run Code Online (Sandbox Code Playgroud)

  • get_dummies 中是否还没有实现任何参数可以让您轻松地做到这一点?似乎是一个常见问题...... (5认同)
  • 这是正确的,但如果您的 df 有一些索引,您可能会遇到问题,因为 _concat_ 方法基于索引合并,而 _get_dummies_ 重置它。在这种情况下,我建议使用 [set_index](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.set_index.html) 方法: `df = pd.concat( [df,dum.set_index(df.index)],轴= 1)` (4认同)