熊猫检查时间序列的连续性

Question

熊猫检查时间序列的连续性

我有一个带有月度索引的 DataFrame。我想检查时间指数是否在每月频率上是连续的，如果可能的话，检查它变得不连续的地方，例如在其指数中相邻的两个月之间有某些“间隔月”。

示例：以下时间序列数据

1964-07-31    100.00
1964-08-31     98.81
1964-09-30    101.21
1964-11-30    101.42
1964-12-31    101.45
1965-03-31     91.49
1965-04-30     90.33
1965-05-31     85.23
1965-06-30     86.10
1965-08-31     84.26

Run Code Online (Sandbox Code Playgroud)

错过了 1964/10、1965/[1,2,7]。

Answer 1

Lud*_*idt 7

我经常通过计算每个指数值之间的差距来做到这一点。

times_gaps = df.index - df.index.shift(1)

Run Code Online (Sandbox Code Playgroud)

然后你可以绘制这些：

times_gaps.plot()

Run Code Online (Sandbox Code Playgroud)

如果有间隙，您很快就会看到哪里。如果没有间隙，您将看到一条水平直线。

您还可以选择间隙时间：

times_gaps[times_gaps> threshold]

Run Code Online (Sandbox Code Playgroud)

Answer 2

jez*_*ael 5

使用asfreq由每月添加缺少的日期时间，过滤器上，以新的Series，如果需要通过分组年创建月份列表：

s = s.asfreq('m')
s1 = pd.Series(s[s.isnull()].index)
print (s1)
0   1964-10-31
1   1965-01-31
2   1965-02-28
3   1965-07-31
Name: 0, dtype: datetime64[ns]

out = s1.dt.month.groupby(s1.dt.year).apply(list)
print (out)
0
1964         [10]
1965    [1, 2, 7]
Name: 0, dtype: object

Run Code Online (Sandbox Code Playgroud)

设置：

s = pd.Series({pd.Timestamp('1964-07-31 00:00:00'): 100.0, 
               pd.Timestamp('1964-08-31 00:00:00'): 98.81, 
               pd.Timestamp('1964-09-30 00:00:00'): 101.21, 
               pd.Timestamp('1964-11-30 00:00:00'): 101.42, 
               pd.Timestamp('1964-12-31 00:00:00'): 101.45,
               pd.Timestamp('1965-03-31 00:00:00'): 91.49, 
               pd.Timestamp('1965-04-30 00:00:00'): 90.33, 
               pd.Timestamp('1965-05-31 00:00:00'): 85.23, 
               pd.Timestamp('1965-06-30 00:00:00'): 86.1, 
               pd.Timestamp('1965-08-31 00:00:00'): 84.26})

print (s)
1964-07-31    100.00
1964-08-31     98.81
1964-09-30    101.21
1964-11-30    101.42
1964-12-31    101.45
1965-03-31     91.49
1965-04-30     90.33
1965-05-31     85.23
1965-06-30     86.10
1965-08-31     84.26
dtype: float64

Run Code Online (Sandbox Code Playgroud)

编辑：

如果日期时间并不总是几个月的最后一天：

s = pd.Series({pd.Timestamp('1964-07-31 00:00:00'): 100.0, 
               pd.Timestamp('1964-08-31 00:00:00'): 98.81, 
               pd.Timestamp('1964-09-01 00:00:00'): 101.21, 
               pd.Timestamp('1964-11-02 00:00:00'): 101.42, 
               pd.Timestamp('1964-12-05 00:00:00'): 101.45,
               pd.Timestamp('1965-03-31 00:00:00'): 91.49, 
               pd.Timestamp('1965-04-30 00:00:00'): 90.33, 
               pd.Timestamp('1965-05-31 00:00:00'): 85.23, 
               pd.Timestamp('1965-06-30 00:00:00'): 86.1, 
               pd.Timestamp('1965-08-31 00:00:00'): 84.26})
print (s)
1964-07-31    100.00
1964-08-31     98.81
1964-09-01    101.21
1964-11-02    101.42
1964-12-05    101.45
1965-03-31     91.49
1965-04-30     90.33
1965-05-31     85.23
1965-06-30     86.10
1965-08-31     84.26
dtype: float64

#convert all months to first day
s.index = s.index.to_period('m').to_timestamp()
#MS is start month frequency
s = s.asfreq('MS')
s1 = pd.Series(s[s.isnull()].index)
print (s1)
0   1964-10-01
1   1965-01-01
2   1965-02-01
3   1965-07-01
dtype: datetime64[ns]

Run Code Online (Sandbox Code Playgroud)

@Vim - 我有一个想法，而是先测试一下。 (2认同)

归档时间：	7 年，1 月前
查看次数：	6140 次
最近记录：	7 年，1 月前