SSM*_*SMK 1 python dataframe python-3.x pandas pandas-groupby
我有一个如下所示的数据框
op1 = pd.DataFrame({
'subject_id':[1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2],
'date' : ['1/1/2017','1/1/2017','1/1/2017','1/2/2017','1/2/2017','1/2/2017','1/3/2017','1/3/2017','1/3/2017','1/4/2017','1/4/2017','1/4/2017','1/5/2017','1/5/2017','1/5/2017',
'1/6/2017','1/6/2017','1/6/2017'],
'val' :[5,5,11,10,5,7,16,12,11,21,23,26,6,8,5,11,10,3]
})
Run Code Online (Sandbox Code Playgroud)
我想这样做是得到min
和max
每天每个主题。
尽管我的代码在下面工作,但我觉得可以用更好的方式编写
t1 = op1.groupby(['subject_id','date'])['val'].max().reset_index()
t2 = op1.groupby(['subject_id','date'])['val'].min().reset_index()
t1.merge(t2,on=['subject_id','date'],how='inner',suffixes=('_max', '_min'))
Run Code Online (Sandbox Code Playgroud)
输出应如下所示。尽管我的代码有效,但我感觉并不优雅。是否有任何其他的方式来写max
,并min
在同一行?
GroupBy.agg
与元组一起使用以获取新的列名和聚合函数:
df = (op1.groupby(['subject_id','date'])['val']
.agg([('val_max', 'max'),('val_min', 'min')])
.reset_index())
print (df)
subject_id date val_max val_min
0 1 1/1/2017 11 5
1 1 1/2/2017 10 5
2 1 1/3/2017 16 11
3 2 1/4/2017 26 21
4 2 1/5/2017 8 5
5 2 1/6/2017 11 3
Run Code Online (Sandbox Code Playgroud)
在熊猫0.25+中可以使用named aggregation
:
df = (op1.groupby(['subject_id','date'])
.agg(val_min=pd.NamedAgg(column='val', aggfunc='min'),
val_max=pd.NamedAgg(column='val', aggfunc='max'))
.reset_index())
Run Code Online (Sandbox Code Playgroud)