Pandas 中不成比例的分层抽样

Man*_*rra 0 python random sampling dataframe pandas

如何从Name以下数据框中的每组(列)中随机选择一行:

   Distance   Name  Time  Order
1        16   John     5      0
4        31   John     9      1
0        23   Kate     3      0
3        15   Kate     7      1
2        32  Peter     2      0
5        26  Peter     4      1
Run Code Online (Sandbox Code Playgroud)

预期结果:

Distance   Name  Time  Order

4        31   John     9      1
0        23   Kate     3      0
2        32  Peter     2      0
Run Code Online (Sandbox Code Playgroud)

ank*_*_91 5

你可以使用一个groupbyon Namecol 并申请 sample

df.groupby('Name',as_index=False).apply(lambda x:x.sample()).reset_index(drop=True)
Run Code Online (Sandbox Code Playgroud)
    Distance   Name  Time  Order
0        31   John     9      1
1        15   Kate     7      1
2        32  Peter     2      0
Run Code Online (Sandbox Code Playgroud)