我有学生姓名、不同科目的分数、科目名称。我想在数据框中添加一列,其中包含每个学生得分最高的科目。这是数据:
Data['Subject with highest score'] = Data.groupby(['Names','Subject'])[['Scores']].transform(lambda x: x.max())
Run Code Online (Sandbox Code Playgroud)
对值进行排序Scores,然后将数据框Names和transform列Subject分组last
df['S(max)'] = df.sort_values('Scores').groupby('Names')['Subject'].transform('last')
Run Code Online (Sandbox Code Playgroud)
Names或者,我们可以通过转换对数据帧进行分组,Scores以idxmax广播与具有最大分数的行相对应的索引,然后使用这些索引从Subject列中获取相应的行
df['S(max)'] = df.loc[df.groupby('Names')['Scores'].transform('idxmax'), 'Subject'].tolist()
Run Code Online (Sandbox Code Playgroud)
Names Scores Subject S(max)
0 Dan 98 Math Math
1 Dan 88 English Math
2 Dan 90 Biology Math
3 Bob 80 Math Chemistry
4 Bob 93 Chemistry Chemistry
5 Bob 70 Sports Chemistry
6 Bob 85 French Chemistry
7 Michael 100 History History
8 Sandra 67 French French
9 Michael 89 Math History
10 Michael 74 Sports History
11 Jacky 65 Biology Physics
12 Jacky 100 Physics Physics
13 Jacky 90 Geometry Physics
14 Jacky 87 Geography Physics
15 Jacky 69 Math Physics
16 Dan 73 Sports Math
17 Sandra 50 History French
Run Code Online (Sandbox Code Playgroud)