我有时间索引数据:
df2 = pd.DataFrame({ 'day': pd.Series([date(2012, 1, 1), date(2012, 1, 3)]), 'b' : pd.Series([0.22, 0.3]) })
df2 = df2.set_index('day')
df2
b
day
2012-01-01 0.22
2012-01-03 0.30
Run Code Online (Sandbox Code Playgroud)
扩展此数据框的最佳方法是什么,以便它在2012年1月的每一天都有一行(比如说),其中所有列都设置为NaN(仅此处b)我们没有数据的地方?
所以期望的结果是:
b
day
2012-01-01 0.22
2012-01-02 NaN
2012-01-03 0.30
2012-01-04 NaN
...
2012-01-31 NaN
Run Code Online (Sandbox Code Playgroud)
非常感谢!
Mar*_*ark 27
用这个:
ix = pd.DatetimeIndex(start=date(2012, 1, 1), end=date(2012, 1, 31), freq='D')
df2.reindex(ix)
Run Code Online (Sandbox Code Playgroud)
这使:
b
2012-01-01 0.22
2012-01-02 NaN
2012-01-03 0.30
2012-01-04 NaN
2012-01-05 NaN
[...]
2012-01-29 NaN
2012-01-30 NaN
2012-01-31 NaN
Run Code Online (Sandbox Code Playgroud)
您可以重新采样过去的日期作为频率,而不指定fill_method参数缺失值将根据需要NaN填充
df3 = df2.asfreq('D')
df3
Out[16]:
b
2012-01-01 0.22
2012-01-02 NaN
2012-01-03 0.30
Run Code Online (Sandbox Code Playgroud)
回答你的第二部分,我目前想不出更优雅的方式:
df3 = DataFrame({ 'day': Series([date(2012, 1, 4), date(2012, 1, 31)])})
df3.set_index('day',inplace=True)
merged = df2.append(df3)
merged = merged.asfreq('D')
merged
Out[46]:
b
2012-01-01 0.22
2012-01-02 NaN
2012-01-03 0.30
2012-01-04 NaN
2012-01-05 NaN
2012-01-06 NaN
2012-01-07 NaN
2012-01-08 NaN
2012-01-09 NaN
2012-01-10 NaN
2012-01-11 NaN
2012-01-12 NaN
2012-01-13 NaN
2012-01-14 NaN
2012-01-15 NaN
2012-01-16 NaN
2012-01-17 NaN
2012-01-18 NaN
2012-01-19 NaN
2012-01-20 NaN
2012-01-21 NaN
2012-01-22 NaN
2012-01-23 NaN
2012-01-24 NaN
2012-01-25 NaN
2012-01-26 NaN
2012-01-27 NaN
2012-01-28 NaN
2012-01-29 NaN
2012-01-30 NaN
2012-01-31 NaN
Run Code Online (Sandbox Code Playgroud)
这构建了第二个时间序列,然后我们asfreq('D')像以前一样追加和调用。
| 归档时间: |
|
| 查看次数: |
13988 次 |
| 最近记录: |