计算熊猫时间序列的每日活动

fcc*_*lho 3 python pandas

嗨我有一个时间序列,想要计算我每天有多少事件(即一天内表中的行数).我想要使​​用的命令是:

ts.resample('D', how='count')
Run Code Online (Sandbox Code Playgroud)

但是我认为"count"不是时间序列的有效聚合函数.

只是为了澄清,这里是数据帧的示例:

0   2008-02-22 03:43:00
1   2008-02-22 03:43:00
2   2010-08-05 06:48:00
3   2006-02-07 06:40:00
4   2005-06-06 05:04:00
5   2008-04-17 02:11:00
6   2012-05-12 06:46:00
7   2004-05-17 08:42:00
8   2004-08-02 05:02:00
9   2008-03-26 03:53:00
Name: Data_Hora, dtype: datetime64[ns]
Run Code Online (Sandbox Code Playgroud)

这是我得到的错误:

ts.resample('D').count()

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-42-86643e21ce18> in <module>()
----> 1 ts.resample('D').count()

/usr/local/lib/python2.7/dist-packages/pandas/core/generic.pyc in resample(self, rule, how, axis, fill_method, closed, label, convention, kind, loffset, limit, base)
    255     def resample(self, rule, how=None, axis=0, fill_method=None,
    256                  closed=None, label=None, convention='start',
--> 257                  kind=None, loffset=None, limit=None, base=0):
    258         """
    259         Convenience method for frequency conversion and resampling of regular

/usr/local/lib/python2.7/dist-packages/pandas/tseries/resample.pyc in resample(self, obj)
     98             return obj
     99         else:  # pragma: no cover
--> 100             raise TypeError('Only valid with DatetimeIndex or PeriodIndex')
    101 
    102         rs_axis = rs._get_axis(self.axis)

TypeError: Only valid with DatetimeIndex or PeriodIndex
Run Code Online (Sandbox Code Playgroud)

可以通过将datetime列转换为带有set_index的索引来解决此问题.但是,在我这样做之后,我仍然收到以下错误:

DataError: No numeric types to aggregate
Run Code Online (Sandbox Code Playgroud)

因为我的Dataframe没有数字列.

但我只想数排!! 来自SQL的简单"select count(*)group by ...".

fcc*_*lho 6

为了使其工作,在删除索引为NaT的行之后:

df2 = df[df.index!=pd.NaT]
Run Code Online (Sandbox Code Playgroud)

我不得不添加一列:

df2['n'] = 1
Run Code Online (Sandbox Code Playgroud)

然后只计算该列:

df2.n.resample('D', how="sum")
Run Code Online (Sandbox Code Playgroud)

然后我可以使用以下内容可视化数据:

plot(df2.n.resample('D', how="sum"))
Run Code Online (Sandbox Code Playgroud)

  • 目前,在pandas 0.18.1中,不推荐使用选项`how`.[见这里](http://pandas.pydata.org/pandas-docs/version/0.18.0/whatsnew.html#downsampling).建议的方法是`df2.n.resample('D').sum()`. (2认同)