Plotly：如何处理金融时间序列中缺失的日期？

Question

Plotly：如何处理金融时间序列中缺失的日期？

金融时间序列通常充满了缺失的数据。并且开箱即用，只需显示如下所示的一行即可直观地处理一系列缺少时间戳的序列。但这里的挑战是将时间戳解释为一个值，并在图中插入所有缺失的日期。

大多数时候，我发现完全省略这些日期会使情节看起来更好。https://plotly.com/python/time-series/#hiding-weekends-and-holidays下的plotly 文档中的示例显示了如何使用以下方法处理某些日期类别（例如周末或假期）的缺失日期：

fig.update_xaxes(
    rangebreaks=[
        dict(bounds=["sat", "mon"]), #hide weekends
        dict(values=["2015-12-25", "2016-01-01"])  # hide Christmas and New Year's
    ]
)

Run Code Online (Sandbox Code Playgroud)

这里的缺点是您的数据集可能会丢失任何其他工作日的一些数据。当然，您必须指定不同国家的假期日期，那么还有其他方法吗？

可重现的代码：

import pandas as pd
import numpy as np
import plotly.graph_objects as go

# data
np.random.seed(1234)
n_obs = 15
frequency = 'D'
daterange = pd.date_range('2020', freq=frequency, periods=n_obs)
values = np.random.randint(low=-5, high=6, size=n_obs).tolist()
df = pd.DataFrame({'time':daterange, 'value':values})
df = df.set_index('time')
df.iloc[0]=100; df['value']=df.value.cumsum()

# Missing timestamps
df.iloc[2:5] = np.nan; df.iloc[8:13] = np.nan
df.dropna(inplace = True)

# plotly figure
fig=go.Figure(go.Scatter(x=df.index, y =df['value']))
fig.update_layout(template = 'plotly_dark')
fig.show()

Run Code Online (Sandbox Code Playgroud)

Answer 1

ves*_*and 7

他们这里的关键仍然是使用rangebreak属性。但是，如果您要遵循链接示例中解释的方法，则必须手动包含每个缺失的日期。但这种情况下缺失数据的解决方案实际上是更多的缺失数据。这就是原因：

1.您可以retrieve the timestamps从系列的开头和结尾开始，然后

2.complete timeline使用以下方法在该期间（可能有更多缺失的日期）构建一个：

dt_all = pd.date_range(start=df.index[0],
                       end=df.index[-1],
                       freq = 'D')

Run Code Online (Sandbox Code Playgroud)

3.接下来，您可以isolate the timestamps使用不在df.index该时间线中的内容：

dt_breaks = [d for d in dt_all_py if d not in dt_obs_py]

Run Code Online (Sandbox Code Playgroud)

4.最后，您可以rangebreaks像这样包含这些时间戳：

fig.update_xaxes(
    rangebreaks=[dict(values=dt_breaks)]
)

Run Code Online (Sandbox Code Playgroud)

阴谋：

完整代码：

import pandas as pd
import numpy as np
import plotly.graph_objects as go

# data
np.random.seed(1234)
n_obs = 15
frequency = 'D'
daterange = pd.date_range('2020', freq=frequency, periods=n_obs)
values = np.random.randint(low=-5, high=6, size=n_obs).tolist()
df = pd.DataFrame({'time':daterange, 'value':values})
df = df.set_index('time')
df.iloc[0]=100; df['value']=df.value.cumsum()

# Missing timestamps
df.iloc[2:5] = np.nan; df.iloc[8:13] = np.nan
df.dropna(inplace = True)

# plotly figure
fig=go.Figure(go.Scatter(x=df.index, y =df['value']))
fig.update_layout(template = 'plotly_dark')

# complete timeline between first and last timestamps
dt_all = pd.date_range(start=df.index[0],
                       end=df.index[-1],
                       freq = frequency)
                        
# make sure input and synthetic time series are of the same types
dt_all_py = [d.to_pydatetime() for d in dt_all]
dt_obs_py = [d.to_pydatetime() for d in df.index]

# find which timestamps are missing in the complete timeline
dt_breaks = [d for d in dt_all_py if d not in dt_obs_py]

# remove missing timestamps from visualization
fig.update_xaxes(
    rangebreaks=[dict(values=dt_breaks)] # hide timestamps with no values
)
#fig.update_layout(title=dict(text="Some dates are missing, but still displayed"))
fig.update_layout(title=dict(text="Missing dates are excluded by rangebreaks"))
fig.update_xaxes(showgrid=False)
fig.show()

Run Code Online (Sandbox Code Playgroud)

归档时间：	5 年，4 月前
查看次数：	3668 次
最近记录：	3 年，7 月前