ahb*_*bon 4 dataframe python-3.x pandas
假设我们有一个学生的成绩数据df1和学分数据df2如下:
df1:
stu_id major Python English C++
0 U202010521 computer 56 81 82
1 U202010522 management 92 56 64
2 U202010523 management 95 88 81
3 U202010524 BigData&AI 79 53 74
4 U202010525 computer 53 71 -1
5 U202010526 computer 78 96 53
6 U202010527 BigData&AI 69 63 74
7 U202010528 BigData&AI 86 57 82
8 U202010529 BigData&AI 81 100 85
9 U202010530 BigData&AI 79 67 80
Run Code Online (Sandbox Code Playgroud)
df2:
class credit
0 Python 2
1 English 4
2 C++ 3
Run Code Online (Sandbox Code Playgroud)
我需要计算weighted average每个学生的分数。
df2['credit_ratio'] = df2['credit']/9
Run Code Online (Sandbox Code Playgroud)
出去:
class credit credit_ratio
0 Python 2 0.222222
1 English 4 0.444444
2 C++ 3 0.333333
Run Code Online (Sandbox Code Playgroud)
即,对于U202010521,他/她的加权分数将为56*0.22 + 81*0.44 + 82*0.33 = 75.02,我需要将每个学生的分数计算weighted_score为一个新列,我如何在 Pandas 中做到这一点?
尝试使用set_index+mul然后sum在 axis=1 上:
df1['weighted_score'] = (
df1[df2['class']].mul(df2.set_index('class')['credit_ratio']).sum(axis=1)
)
Run Code Online (Sandbox Code Playgroud)
df1:
stu_id major Python English C++ weighted_score
0 U202010521 computer 56 81 82 75.777778
1 U202010522 management 92 56 64 66.666667
2 U202010523 management 95 88 81 87.222222
3 U202010524 BigData&AI 79 53 74 65.777778
4 U202010525 computer 53 71 -1 43.000000
5 U202010526 computer 78 96 53 77.666667
6 U202010527 BigData&AI 69 63 74 68.000000
7 U202010528 BigData&AI 86 57 82 71.777778
8 U202010529 BigData&AI 81 100 85 90.777778
9 U202010530 BigData&AI 79 67 80 74.000000
Run Code Online (Sandbox Code Playgroud)
说明:
通过将 df2 的索引设置为 class,乘法现在将与 df1 的列正确对齐:
df2.set_index('class')['credit_ratio']
Run Code Online (Sandbox Code Playgroud)
df1['weighted_score'] = (
df1[df2['class']].mul(df2.set_index('class')['credit_ratio']).sum(axis=1)
)
Run Code Online (Sandbox Code Playgroud)
df1使用以下值选择特定列df2:
df1[df2['class']]
Run Code Online (Sandbox Code Playgroud)
Python English C++
0 56 81 82
1 92 56 64
2 95 88 81
3 79 53 74
4 53 71 -1
5 78 96 53
6 69 63 74
7 86 57 82
8 81 100 85
9 79 67 80
Run Code Online (Sandbox Code Playgroud)
相乘以应用权重:
df1[df2['class']].mul(df2.set_index('class')['credit_ratio'])
Run Code Online (Sandbox Code Playgroud)
Python English C++
0 12.444444 36.000000 27.333333
1 20.444444 24.888889 21.333333
2 21.111111 39.111111 27.000000
3 17.555556 23.555556 24.666667
4 11.777778 31.555556 -0.333333
5 17.333333 42.666667 17.666667
6 15.333333 28.000000 24.666667
7 19.111111 25.333333 27.333333
8 18.000000 44.444444 28.333333
9 17.555556 29.777778 26.666667
Run Code Online (Sandbox Code Playgroud)
然后对各行求和以获得总值。
df1[df2['class']].mul(df2.set_index('class')['credit_ratio']).sum(axis=1)
Run Code Online (Sandbox Code Playgroud)
0 75.777778
1 66.666667
2 87.222222
3 65.777778
4 43.000000
5 77.666667
6 68.000000
7 71.777778
8 90.777778
9 74.000000
dtype: float64
Run Code Online (Sandbox Code Playgroud)