com*_*ave 2 python sorting pandas
我有一个包含每日产品和数量数据的DF:
date product volume
20160101 A 10
20160101 B 5
...
20160102 A 20
...
...
20160328 B 20
20160328 C 100
...
20160330 D 20
Run Code Online (Sandbox Code Playgroud)
我已经通过每月将其分组
df['yearmonth'] = df.date.astype(str).str[:6]
grouped = df.groupby(['yearmonth','product'])['Volume'].sum()
Run Code Online (Sandbox Code Playgroud)
这给了我一系列的形式:
yearmonth product
201601 A 100
B 90
C 90
D 85
E 180
F 50
...
201602 A 200
C 120
F 220
G 40
I 50
...
201603 B 120
C 110
D 110
...
Run Code Online (Sandbox Code Playgroud)
我想返回每个产品每月的前n个体积值。例如,前三个值将返回:
201601 A 100
B 90
C 90
E 180
201602 A 200
C 120
F 220
201603 B 120
C 110
D 110
Run Code Online (Sandbox Code Playgroud)
我可以使用找到一些答案pd.IndexSlice,select但是它们似乎仅对索引起作用。我不知道如何对各个组的值进行排序
您可以使用SeriesGroupBy.nlargest:
print (grouped.groupby(level='yearmonth').nlargest(3).reset_index(level=0, drop=True))
yearmonth product
201601 E 180
A 100
B 90
201602 F 220
A 200
C 120
201603 B 120
C 110
D 110
Name: val, dtype: int64
Run Code Online (Sandbox Code Playgroud)
您也可以将to_datetimewith与to_period转换为year-month期间:
print (df)
date product Volume
0 20160101 A 10
1 20160101 B 5
2 20160101 C 10
3 20160101 D 5
4 20160102 A 20
5 20160102 A 10
6 20160102 B 5
7 20160102 C 10
8 20160102 D 5
9 20160328 A 20
10 20160328 C 100
11 20160328 B 20
12 20160328 D 20
13 20160330 D 20
Run Code Online (Sandbox Code Playgroud)
grouped = df.groupby([pd.to_datetime(df.date, format='%Y%m%d').dt.to_period('M'),
'product'])['Volume'].sum()
print (grouped)
date product
2016-01 A 40
B 10
C 20
D 10
2016-03 A 20
B 20
C 100
D 40
Name: Volume, dtype: int64
print (grouped.groupby(level='date').nlargest(3).reset_index(level=0, drop=True))
date product
2016-01 A 40
C 20
B 10
2016-03 C 100
D 40
A 20
Name: Volume, dtype: int64
Run Code Online (Sandbox Code Playgroud)