Pau*_*aul 7 python bayesian pymc pymc3
我仍在学习PYMC3,但是我在文档中找不到以下问题的任何内容。考虑这个问题的贝叶斯结构时间序列(BSTS)模型,没有季节性。可以在PYMC3中对其进行建模,如下所示:
import pymc3, numpy, matplotlib.pyplot
# generate some test data
t = numpy.linspace(0,2*numpy.pi,100)
y_full = numpy.cos(5*t)
y_train = y_full[:90]
y_test = y_full[90:]
# specify the model
with pymc3.Model() as model:
grw = pymc3.GaussianRandomWalk('grw',mu=0,sd=1,shape=y_train.size)
y = pymc3.Normal('y',mu=grw,sd=1,observed=y_train)
trace = pymc3.sample(1000)
y_mean_pred = pymc3.sample_ppc(trace,samples=1000,model=model)['y'].mean(axis=0)
fig = matplotlib.pyplot.figure(dpi=100)
ax = fig.add_subplot(111)
ax.plot(t,y_full,c='b')
ax.plot(t[:90],y_mean_pred,c='r')
matplotlib.pyplot.show()
Run Code Online (Sandbox Code Playgroud)
现在,我想预测接下来10个时间步的行为,即y_test。我还想在此区域包括产生贝叶斯锥的可靠区域,例如,请参见此处。不幸的是,在上述连接中产生锥体的机构有点模糊。在更传统的AR模型中,人们可以学习平均回归系数并手动扩展平均曲线。但是,在此BSTS模型中,没有明显的方法来执行此操作。或者,如果存在回归变量,则可以使用theano.shared并使用更细/扩展的网格对其进行更新,以使用sample_ppc进行插补和推断,但这在此设置中并不是真正的选择。也许sample_ppc在这里是个红鲱鱼,但尚不清楚如何进行其他操作。任何帮助都将受到欢迎。
我认为以下工作。但是,它非常笨拙,要求我在训练之前就知道要预测多远(特别是它不包括流媒体使用或简单的EDA)。我怀疑有更好的方法,我宁愿接受具有更多Pymc3经验的人更好的解决方案
import numpy, pymc3, matplotlib.pyplot, seaborn
# generate some data
t = numpy.linspace(0,2*numpy.pi,100)
y_full = numpy.cos(5*t)
# mask the data that I want to predict (requires knowledge
# that one might not always have at training time).
cutoff_idx = 80
y_obs = numpy.ma.MaskedArray(y_full,numpy.arange(t.size)>cutoff_idx)
# specify and train the model, used the masked array to supply only
# the observed data
with pymc3.Model() as model:
grw = pymc3.GaussianRandomWalk('grw',mu=0,sd=1,shape=y_obs.size)
y = pymc3.Normal('y',mu=grw,sd=1,observed=y_obs)
trace = pymc3.sample(5000)
y_pred = pymc3.sample_ppc(trace,samples=20000,model=model)['y']
y_pred_mean = y_pred.mean(axis=0)
# compute percentiles
dfp = numpy.percentile(y_pred,[2.5,25,50,70,97.5],axis=0)
# plot actual data and summary posterior information
pal = seaborn.color_palette('Purples')
fig = matplotlib.pyplot.figure(dpi=100)
ax = fig.add_subplot(111)
ax.plot(t,y_full,c='g',label='true value',alpha=0.5)
ax.plot(t,y_pred_mean,c=pal[5],label='posterior mean',alpha=0.5)
ax.plot(t,dfp[2,:],alpha=0.75,color=pal[3],label='posterior median')
ax.fill_between(t,dfp[0,:],dfp[4,:],alpha=0.5,color=pal[1],label='CR 95%')
ax.fill_between(t,dfp[1,:],dfp[3,:],alpha=0.4,color=pal[2],label='CR 50%')
ax.axvline(x=t[cutoff_idx],linestyle='--',color='r',alpha=0.25)
ax.legend()
matplotlib.pyplot.show()
Run Code Online (Sandbox Code Playgroud)
这会输出以下内容,这似乎是一个非常糟糕的预测,但至少代码提供的样本值不足。