在熊猫中填写时间数据

Question

在熊猫中填写时间数据

我有每15秒一次的数据。但是，缺少一些值。这些没有用NaN标记，但是根本不存在。如何填写这些值？
我已经尝试过重新采样，但这也改变了我的原始数据。所以，为什么这行不通：

a=pd.Series([1.,3.,4.,3.,5.],['2016-05-25 00:00:35','2016-05-25 00:00:50','2016-05-25 00:01:05','2016-05-25 00:01:35','2016-05-25 00:02:05'])                                   
a.index=pd.to_datetime(a.index)
a.resample('15S').mean()

In [368]: a
Out[368]: 
2016-05-25 00:00:35    1.0
2016-05-25 00:00:50    3.0
2016-05-25 00:01:05    4.0
2016-05-25 00:01:35    3.0
2016-05-25 00:02:05    5.0
dtype: float64

Run Code Online (Sandbox Code Playgroud)

它向我展示了这一点：

2016-05-25 00:00:30    1.0
2016-05-25 00:00:45    3.0
2016-05-25 00:01:00    4.0
2016-05-25 00:01:15    NaN
2016-05-25 00:01:30    3.0
2016-05-25 00:01:45    NaN
2016-05-25 00:02:00    5.0
Freq: 15S, dtype: float64

Run Code Online (Sandbox Code Playgroud)

因此，我不再拥有00:35或00:50的值。
对于我最初的较大数据集，我还最终在重采样数据的末尾看到了许多大的NaN值。
我想将15s的数据重新采样到15s，所以每当在特定时间内没有数据存在时，都应该使用其周围的值的平均值来填充它。有没有办法做到这一点？
另外，为什么我重新采样时时间基准会改变？我的原始数据始于00:00:35，重新采样后始于00:30？好像它偏移了5秒。
在我的示例数据中，它应该做的所有事情都是在00:01:50创建一个附加数据条目。

编辑

我意识到我的数据比我想象的要复杂一些。实际上，“基础”在其中发生了部分变化。如果我使用下面的解决方案，则它适用于部分数据，但是值停止更改。例如：

a = pd.Series([1.,3.,4.,3.,5.,6.,7.,8.], ['2016-05-25 00:00:35','2016-05-25 00:00:50','2016-05-25 00:01:05','2016-05-25 00:01:35','2016-05-25 00:02:05','2016-05-25 00:03:00','2016-05-25 00:04:00','2016-05-25 00:06:00'])                                   

In [79]: a
Out[79]: 
2016-05-25 00:00:35    1.0
2016-05-25 00:00:50    3.0
2016-05-25 00:01:05    4.0
2016-05-25 00:01:35    3.0
2016-05-25 00:02:05    5.0
2016-05-25 00:03:00    6.0
2016-05-25 00:04:00    7.0
2016-05-25 00:06:00    8.0
dtype: float64

In [80]: a.index = pd.to_datetime(a.index)

In [81]: a.resample('15S', base=5).interpolate()
Out[81]: 
2016-05-25 00:00:35    1.0
2016-05-25 00:00:50    3.0
2016-05-25 00:01:05    4.0
2016-05-25 00:01:20    3.5
2016-05-25 00:01:35    3.0
2016-05-25 00:01:50    4.0
2016-05-25 00:02:05    5.0
2016-05-25 00:02:20    5.0
2016-05-25 00:02:35    5.0
2016-05-25 00:02:50    5.0
2016-05-25 00:03:05    5.0
2016-05-25 00:03:20    5.0
2016-05-25 00:03:35    5.0
2016-05-25 00:03:50    5.0
2016-05-25 00:04:05    5.0
2016-05-25 00:04:20    5.0
2016-05-25 00:04:35    5.0
2016-05-25 00:04:50    5.0
2016-05-25 00:05:05    5.0
2016-05-25 00:05:20    5.0
2016-05-25 00:05:35    5.0
2016-05-25 00:05:50    5.0
Freq: 15S, dtype: float64

Run Code Online (Sandbox Code Playgroud)

如您所见，它在2:05之后停止插值，并且似乎忽略了3：00、4：00和5:00的数据。

Answer 1

Alb*_*oso 5

@IanS和@piRSquared都解决了基数的偏移。至于NaNs 填充：pandas有向前填充（.ffill()/ .pad()）和向后填充（.bfill()/ .backfill()）的方法，但没有取平均值的方法。一种快速的方法是手动取均值：

b = a.resample('15S', base=5)
(b.ffill() + b.bfill()) / 2

Run Code Online (Sandbox Code Playgroud)

输出：

b = a.resample('15S', base=5)
(b.ffill() + b.bfill()) / 2

Run Code Online (Sandbox Code Playgroud)

编辑：我经纠正：有一个内置方法：.interpolate()。

a.resample('15S', base=5).interpolate()

Run Code Online (Sandbox Code Playgroud)

归档时间：	9 年，5 月前
查看次数：	538 次
最近记录：	9 年，5 月前