我在进行基本数据调整时遇到了这种行为,如下例所示:
In [55]: import pandas as pd
In [56]: import numpy as np
In [57]: rng = pd.date_range('1/1/2000', periods=10, freq='4h')
In [58]: lvls = ['A','A','A','B','B','B','C','C','C','C']
In [59]: df = pd.DataFrame({'TS': rng, 'V' : np.random.randn(len(rng)), 'L' : lvls})
In [60]: df
Out[60]:
L TS V
0 A 2000-01-01 00:00:00 -1.152371
1 A 2000-01-01 04:00:00 -2.035737
2 A 2000-01-01 08:00:00 -0.493008
3 B 2000-01-01 12:00:00 -0.279055
4 B 2000-01-01 16:00:00 -0.132386
5 B 2000-01-01 20:00:00 0.584091
6 C 2000-01-02 00:00:00 -0.297270
7 …Run Code Online (Sandbox Code Playgroud) 使用DataFrame(pdandas为pd,numpy为np):
test = pd.DataFrame({'A' : [10,11,12,13,15,25,43,70],
'B' : [1,2,3,4,5,6,7,8],
'C' : [1,1,1,1,2,2,2,2]})
In [39]: test
Out[39]:
A B C
0 10 1 1
1 11 2 1
2 12 3 1
3 13 4 1
4 15 5 2
5 25 6 2
6 43 7 2
7 70 8 2
Run Code Online (Sandbox Code Playgroud)
将DF按'C'分组并与np.mean(也包括sum,min,max)进行聚合,从而在各组内生成按列的聚合:
In [40]: test_g = test.groupby('C')
In [41]: test_g.aggregate(np.mean)
Out[41]:
A B
C
1 11.50 2.5
2 38.25 6.5
Run Code Online (Sandbox Code Playgroud)
但是,看起来使用np.median进行聚合会在组内产生DataFrame明智的聚合:
In [42]: test_g.aggregate(np.median)
Out[42]:
A B
C
1 7.0 …Run Code Online (Sandbox Code Playgroud)