Des*_*wal 5 python dataframe pandas pandas-groupby
我有一个数据集可以在这里找到
它给了我们一个DataFrame喜欢
df=pd.read_csv('https://raw.githubusercontent.com/justmarkham/DAT8/master/data/u.user', sep='|')
df.head()
user_id age gender occupation zip_code
1 24 M technician 85711
2 53 F other 94043
3 23 M writer 32067
4 24 M technician 43537
5 33 F other 15213
Run Code Online (Sandbox Code Playgroud)
我想知道每个项目的男女比例是多少occupation
我已经使用了下面给定的函数,但这不是最佳方法。
df.groupby(['occupation', 'gender']).agg({'gender':'count'}).div(df.groupby('occupation').agg('count'), level='occupation')['gender']*100
Run Code Online (Sandbox Code Playgroud)
这给了我们类似的结果
occupation gender
administrator F 45.569620
M 54.430380
artist F 46.428571
M 53.571429
Run Code Online (Sandbox Code Playgroud)
上面的答案的格式非常不同,因为我想要类似的东西:(演示)
occupation M:F
programmer 2:3
farmer 7:2
Run Code Online (Sandbox Code Playgroud)
有人可以告诉我如何制作自己的聚合函数吗?
这对你有用吗
df_g = df.groupby(['occupation', 'gender']).count().user_id/df.groupby(['occupation']).count().user_id
df_g = df_g.reset_index()
df_g['ratio'] = df_g['user_id'].apply(lambda x: str(Fraction(x).limit_denominator()).replace('/',':'))
Run Code Online (Sandbox Code Playgroud)
输出
occupation gender user_id ratio
0 administrator F 0.455696 36:79
1 administrator M 0.544304 43:79
2 artist F 0.464286 13:28
3 artist M 0.535714 15:28
4 doctor M 1.000000 1
5 educator F 0.273684 26:95
6 educator M 0.726316 69:95
7 engineer F 0.029851 2:67
8 engineer M 0.970149 65:67
9 entertainment F 0.111111 1:9
10 entertainment M 0.888889 8:9
11 executive F 0.093750 3:32
12 executive M 0.906250 29:32
13 healthcare F 0.687500 11:16
14 healthcare M 0.312500 5:16
15 homemaker F 0.857143 6:7
16 homemaker M 0.142857 1:7
17 lawyer F 0.166667 1:6
18 lawyer M 0.833333 5:6
19 librarian F 0.568627 29:51
20 librarian M 0.431373 22:51
21 marketing F 0.384615 5:13
22 marketing M 0.615385 8:13
23 none F 0.444444 4:9
24 none M 0.555556 5:9
25 other F 0.342857 12:35
26 other M 0.657143 23:35
27 programmer F 0.090909 1:11
28 programmer M 0.909091 10:11
29 retired F 0.071429 1:14
30 retired M 0.928571 13:14
31 salesman F 0.250000 1:4
32 salesman M 0.750000 3:4
33 scientist F 0.096774 3:31
34 scientist M 0.903226 28:31
35 student F 0.306122 15:49
36 student M 0.693878 34:49
37 technician F 0.037037 1:27
38 technician M 0.962963 26:27
39 writer F 0.422222 19:45
40 writer M 0.577778 26:45
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
242 次 |
| 最近记录: |