Vim*_*Vim 7 python datetime pandas
我有一个带有月度索引的 DataFrame。我想检查时间指数是否在每月频率上是连续的,如果可能的话,检查它变得不连续的地方,例如在其指数中相邻的两个月之间有某些“间隔月”。
示例:以下时间序列数据
1964-07-31 100.00
1964-08-31 98.81
1964-09-30 101.21
1964-11-30 101.42
1964-12-31 101.45
1965-03-31 91.49
1965-04-30 90.33
1965-05-31 85.23
1965-06-30 86.10
1965-08-31 84.26
Run Code Online (Sandbox Code Playgroud)
错过了 1964/10、1965/[1,2,7]。
我经常通过计算每个指数值之间的差距来做到这一点。
times_gaps = df.index - df.index.shift(1)
Run Code Online (Sandbox Code Playgroud)
然后你可以绘制这些:
times_gaps.plot()
Run Code Online (Sandbox Code Playgroud)
如果有间隙,您很快就会看到哪里。如果没有间隙,您将看到一条水平直线。
您还可以选择间隙时间:
times_gaps[times_gaps> threshold]
Run Code Online (Sandbox Code Playgroud)
使用asfreq由每月添加缺少的日期时间,过滤器上,以新的Series,如果需要通过分组年创建月份列表:
s = s.asfreq('m')
s1 = pd.Series(s[s.isnull()].index)
print (s1)
0 1964-10-31
1 1965-01-31
2 1965-02-28
3 1965-07-31
Name: 0, dtype: datetime64[ns]
out = s1.dt.month.groupby(s1.dt.year).apply(list)
print (out)
0
1964 [10]
1965 [1, 2, 7]
Name: 0, dtype: object
Run Code Online (Sandbox Code Playgroud)
设置:
s = pd.Series({pd.Timestamp('1964-07-31 00:00:00'): 100.0,
pd.Timestamp('1964-08-31 00:00:00'): 98.81,
pd.Timestamp('1964-09-30 00:00:00'): 101.21,
pd.Timestamp('1964-11-30 00:00:00'): 101.42,
pd.Timestamp('1964-12-31 00:00:00'): 101.45,
pd.Timestamp('1965-03-31 00:00:00'): 91.49,
pd.Timestamp('1965-04-30 00:00:00'): 90.33,
pd.Timestamp('1965-05-31 00:00:00'): 85.23,
pd.Timestamp('1965-06-30 00:00:00'): 86.1,
pd.Timestamp('1965-08-31 00:00:00'): 84.26})
print (s)
1964-07-31 100.00
1964-08-31 98.81
1964-09-30 101.21
1964-11-30 101.42
1964-12-31 101.45
1965-03-31 91.49
1965-04-30 90.33
1965-05-31 85.23
1965-06-30 86.10
1965-08-31 84.26
dtype: float64
Run Code Online (Sandbox Code Playgroud)
编辑:
如果日期时间并不总是几个月的最后一天:
s = pd.Series({pd.Timestamp('1964-07-31 00:00:00'): 100.0,
pd.Timestamp('1964-08-31 00:00:00'): 98.81,
pd.Timestamp('1964-09-01 00:00:00'): 101.21,
pd.Timestamp('1964-11-02 00:00:00'): 101.42,
pd.Timestamp('1964-12-05 00:00:00'): 101.45,
pd.Timestamp('1965-03-31 00:00:00'): 91.49,
pd.Timestamp('1965-04-30 00:00:00'): 90.33,
pd.Timestamp('1965-05-31 00:00:00'): 85.23,
pd.Timestamp('1965-06-30 00:00:00'): 86.1,
pd.Timestamp('1965-08-31 00:00:00'): 84.26})
print (s)
1964-07-31 100.00
1964-08-31 98.81
1964-09-01 101.21
1964-11-02 101.42
1964-12-05 101.45
1965-03-31 91.49
1965-04-30 90.33
1965-05-31 85.23
1965-06-30 86.10
1965-08-31 84.26
dtype: float64
#convert all months to first day
s.index = s.index.to_period('m').to_timestamp()
#MS is start month frequency
s = s.asfreq('MS')
s1 = pd.Series(s[s.isnull()].index)
print (s1)
0 1964-10-01
1 1965-01-01
2 1965-02-01
3 1965-07-01
dtype: datetime64[ns]
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
6140 次 |
| 最近记录: |