将列值附加到 Pandas 数据框中同一行的新单元格中

tac*_*ous 5 python data-manipulation pandas

我有一个具有列的csv文件namesub_asub_bsub_csub_dsegmentgender。我想创建一个新列,classes其中包含sub每个学生接受的所有类(-columns),以逗号分隔。

实现这一目标的最简单方法是什么?

结果数据框应如下所示:

+------+-------+-------+-------+-------+---------+--------+---------------------+
| name | sub_a | sub_b | sub_c | sub_d | segment | gender | classes             |
+------+-------+-------+-------+-------+---------+--------+---------------------+
| john | 1     | 1     | 0     | 1     | 1       | 0      | sub_a, sub_b, sub_d |
+------+-------+-------+-------+-------+---------+--------+---------------------+
| mike | 1     | 0     | 1     | 1     | 0       | 0      | sub_a, sub_c, sub_d |
+------+-------+-------+-------+-------+---------+--------+---------------------+
| mary | 1     | 1     | 0     | 1     | 1       | 1      | sub_a, sub_b, sub_d |
+------+-------+-------+-------+-------+---------+--------+---------------------+
| fred | 1     | 0     | 1     | 0     | 0       | 0      | sub_a, sub_c        |
+------+-------+-------+-------+-------+---------+--------+---------------------+
Run Code Online (Sandbox Code Playgroud)

Dis*_*ani 1

您可以apply使用axis=1

例如:如果你的数据框像

df
   A_a  A_b  B_b  B_c
0    1    0    0    1
1    0    1    0    1
2    1    0    1    0
Run Code Online (Sandbox Code Playgroud)

你可以做

df['classes'] = df.apply(lambda x: ', '.join(df.columns[x==1]), axis = 1)
df
   A_a  A_b  B_b  B_c   classes
0    1    0    0    1  A_a, B_c
1    0    1    0    1  A_b, B_c
2    1    0    1    0  A_a, B_b
Run Code Online (Sandbox Code Playgroud)

对于apply特定列,您可以首先使用进行过滤loc

#for your sample data
df_ = df.loc[:,'sub_a':'sub_d']             #or df.loc[:,'sub_a', 'sub_b', 'sub_c', 'sub_d']
df_.apply(lambda x: ', '.join(df_.columns[x==1]), axis = 1)
Run Code Online (Sandbox Code Playgroud)