Hon*_*zaB 6 python dataframe pandas
我在Pandas数据框中有一个Place,它看起来像这样:
**Place**
Berlin
Prague
Mexico
Prague
Mexico
...
Run Code Online (Sandbox Code Playgroud)
我想做以下事情:
is_Berlin is_Prague is_Mexico
1 0 0
0 1 0
0 0 1
0 1 0
0 0 1
Run Code Online (Sandbox Code Playgroud)
我知道我可以单独创建列:
df['is_Berlin'] = df['Place']
df['is_Prague'] = df['Place']
df['is_Mexico'] = df['Place']
Run Code Online (Sandbox Code Playgroud)
然后为每列创建一个字典并应用一个map函数.
#Example just for is_Berlin column
d = {'Berlin': 1,'Prague': 0,'Mexico': 0}
df['is_Berlin'] = df['is_Berlin'].map(d)
Run Code Online (Sandbox Code Playgroud)
但我觉得这有点单调乏味,我相信有很好的pythonic方式如何做到这一点.
您可以使用str.get_dummies并且如果需要将此新列添加到原始列DataFrame,请使用concat:
df1 = df.Place.str.get_dummies()
print df1
Berlin Mexico Prague
0 1 0 0
1 0 0 1
2 0 1 0
3 0 0 1
4 0 1 0
df1.columns = ['is_' + col for col in df1.columns]
print df1
is_Berlin is_Mexico is_Prague
0 1 0 0
1 0 0 1
2 0 1 0
3 0 0 1
4 0 1 0
Run Code Online (Sandbox Code Playgroud)
df = pd.concat([df, df1], axis=1)
print df
Place is_Berlin is_Mexico is_Prague
0 Berlin 1 0 0
1 Prague 0 0 1
2 Mexico 0 1 0
3 Prague 0 0 1
4 Mexico 0 1 0
#if there is more columns, you can drop Place column
df = df.drop('Place', axis=1)
print df
is_Berlin is_Mexico is_Prague
0 1 0 0
1 0 0 1
2 0 1 0
3 0 0 1
4 0 1 0
Run Code Online (Sandbox Code Playgroud)