tac*_*ous 5 python data-manipulation pandas
我有一个具有列的csv文件name,sub_a,sub_b,sub_c,sub_d,segment和gender。我想创建一个新列,classes其中包含sub每个学生接受的所有类(-columns),以逗号分隔。
实现这一目标的最简单方法是什么?
结果数据框应如下所示:
+------+-------+-------+-------+-------+---------+--------+---------------------+
| name | sub_a | sub_b | sub_c | sub_d | segment | gender | classes |
+------+-------+-------+-------+-------+---------+--------+---------------------+
| john | 1 | 1 | 0 | 1 | 1 | 0 | sub_a, sub_b, sub_d |
+------+-------+-------+-------+-------+---------+--------+---------------------+
| mike | 1 | 0 | 1 | 1 | 0 | 0 | sub_a, sub_c, sub_d |
+------+-------+-------+-------+-------+---------+--------+---------------------+
| mary | 1 | 1 | 0 | 1 | 1 | 1 | sub_a, sub_b, sub_d |
+------+-------+-------+-------+-------+---------+--------+---------------------+
| fred | 1 | 0 | 1 | 0 | 0 | 0 | sub_a, sub_c |
+------+-------+-------+-------+-------+---------+--------+---------------------+
Run Code Online (Sandbox Code Playgroud)
您可以apply使用axis=1
例如:如果你的数据框像
df
A_a A_b B_b B_c
0 1 0 0 1
1 0 1 0 1
2 1 0 1 0
Run Code Online (Sandbox Code Playgroud)
你可以做
df['classes'] = df.apply(lambda x: ', '.join(df.columns[x==1]), axis = 1)
df
A_a A_b B_b B_c classes
0 1 0 0 1 A_a, B_c
1 0 1 0 1 A_b, B_c
2 1 0 1 0 A_a, B_b
Run Code Online (Sandbox Code Playgroud)
对于apply特定列,您可以首先使用进行过滤loc
#for your sample data
df_ = df.loc[:,'sub_a':'sub_d'] #or df.loc[:,'sub_a', 'sub_b', 'sub_c', 'sub_d']
df_.apply(lambda x: ', '.join(df_.columns[x==1]), axis = 1)
Run Code Online (Sandbox Code Playgroud)