gra*_*per 0 python time-series resampling pandas
我有这个数据帧:
startTime     endTime  emails_received
index                                             
2014-01-24 14:00:00  1390568400  1390569600    684
2014-01-24 14:00:00  1390568400  1390569300    700
2014-01-24 14:05:00  1390568700  1390569300    438
2014-01-24 14:05:00  1390568700  1390569900    586
2014-01-24 16:00:00  1390575600  1390576500    752
2014-01-24 16:00:00  1390575600  1390576500    743
2014-01-24 16:00:00  1390575600  1390576500    672
2014-01-24 16:00:00  1390575600  1390576200    712
2014-01-24 16:00:00  1390575600  1390576800    708
我运行resample("10min",how ="median").dropna()然后我得到:
                  startTime     endTime  emails_received
start                                             
2014-01-24 14:00:00  1390568550  1390569450    635
2014-01-24 16:00:00  1390575600  1390576500    712
哪个是对的.有没有什么方法可以通过熊猫轻松获得平均值的标准偏差?
您只需要调用.std()DataFrame即可.这是一个说明性的例子.
创建一个 DatetimeIndex
In [38]: index = pd.DatetimeIndex(start='2000-1-1',freq='1T', periods=1000)
创建一个包含2列的DataFrame
In [45]: df = pd.DataFrame({'a':range(1000), 'b':range(1000,3000,2)}, index=index)
DataFrame的Head,Std和Mean
In [47]: df.head()
Out[47]: 
                     a     b
2000-01-01 00:00:00  0  1000
2000-01-01 00:01:00  1  1002
2000-01-01 00:02:00  2  1004
2000-01-01 00:03:00  3  1006
2000-01-01 00:04:00  4  1008
In [48]: df.std()
Out[48]: 
a    288.819436
b    577.638872
dtype: float64
In [49]: df.mean()
Out[49]: 
a     499.5
b    1999.0
dtype: float64
下采样并执行计算相同的统计分数
In [54]: df = df.resample(rule="10T",how="median")
In [55]: df
Out[55]: 
DatetimeIndex: 100 entries, 2000-01-01 00:00:00 to 2000-01-01 16:30:00
Freq: 10T
Data columns (total 2 columns):
a    100  non-null values
b    100  non-null values
dtypes: float64(1), int64(1)
In [56]: df.head()
Out[56]: 
                        a     b
2000-01-01 00:00:00   4.5  1009
2000-01-01 00:10:00  14.5  1029
2000-01-01 00:20:00  24.5  1049
2000-01-01 00:30:00  34.5  1069
2000-01-01 00:40:00  44.5  1089
In [57]: df.std()
Out[57]: 
a    290.11492
b    580.22984
dtype: float64
In [58]: df.mean()
Out[58]: 
a     499.5
b    1999.0
dtype: float64
std()
In [62]: df2 = df.resample(rule="10T", how=np.std)
In [63]: df2
Out[63]: 
DatetimeIndex: 100 entries, 2000-01-01 00:00:00 to 2000-01-01 16:30:00
Freq: 10T
Data columns (total 2 columns):
a    100  non-null values
b    100  non-null values
dtypes: float64(2)
In [64]: df2.head()
Out[64]: 
                           a         b
2000-01-01 00:00:00  3.02765  6.055301
2000-01-01 00:10:00  3.02765  6.055301
2000-01-01 00:20:00  3.02765  6.055301
2000-01-01 00:30:00  3.02765  6.055301
2000-01-01 00:40:00  3.02765  6.055301
以下是该.std()方法的docstring中的信息.
Return standard deviation over requested axis.
NA/null values are excluded
Parameters
----------
axis : {0, 1}
    0 for row-wise, 1 for column-wise
skipna : boolean, default True
    Exclude NA/null values. If an entire row/column is NA, the result
    will be NA
level : int, default None
    If the axis is a MultiIndex (hierarchical), count along a
    particular level, collapsing into a DataFrame
Returns
-------
std : Series (or DataFrame if level specified)
        Normalized by N-1 (unbiased estimator).