Bun*_*oss 3 python forecasting pandas statsmodels difference
我有一个月度数据(df.sales)的熊猫系列。我需要减去12个月前的数据以适应时间序列,因此我运行了以下命令:
sales_new = df.sales.diff(periods=12)
Run Code Online (Sandbox Code Playgroud)
然后,我拟合了ARMA模型,并预测了未来:
model = ARMA(sales_new, order=(2,0)).fit()
model.predict('2015-01-01', '2017-01-01')
Run Code Online (Sandbox Code Playgroud)
因为我已经比较了销售数据,所以当我使用模型进行预测时,它会预测前向差异。如果这是时段1的差异,我将使用np.cumsum(),但是由于这是时段12,因此有点麻烦。
“展开”差异并将其转换回原始数据规模的最佳方法是什么?
我认为您需要根据前12个月的价值计算出未来价值:
periods = 12
df = pd.DataFrame(data={'value': np.random.random(size=24)}, index=pd.date_range(start=date(2014, 1,1), freq='M', periods=24))
diffs = df.diff(periods=periods)
restored = df.copy()
restored.iloc[periods:] = np.nan
for d, val in diffs.iloc[periods:].iterrows():
restored.loc[d] = restored.loc[d - pd.DateOffset(months=periods)].value + val
res = pd.concat([df, diffs, restored], axis=1)
res.columns = ['original', 'diffs', 'restored']
original diffs restored
2014-01-31 0.926367 NaN 0.926367
2014-02-28 0.688898 NaN 0.688898
2014-03-31 0.297025 NaN 0.297025
2014-04-30 0.139094 NaN 0.139094
2014-05-31 0.375082 NaN 0.375082
2014-06-30 0.490638 NaN 0.490638
2014-07-31 0.789683 NaN 0.789683
2014-08-31 0.236841 NaN 0.236841
2014-09-30 0.263245 NaN 0.263245
2014-10-31 0.547025 NaN 0.547025
2014-11-30 0.243444 NaN 0.243444
2014-12-31 0.385028 NaN 0.385028
2015-01-31 0.823224 -0.103142 0.823224
2015-02-28 0.828245 0.139347 0.828245
2015-03-31 0.753291 0.456266 0.753291
2015-04-30 0.447670 0.308576 0.447670
2015-05-31 0.936667 0.561584 0.936667
2015-06-30 0.223049 -0.267589 0.223049
2015-07-31 0.933942 0.144259 0.933942
2015-08-31 0.325726 0.088886 0.325726
2015-09-30 0.947526 0.684281 0.947526
2015-10-31 0.524749 -0.022276 0.524749
2015-11-30 0.431671 0.188227 0.431671
2015-12-31 0.234028 -0.151000 0.234028
Run Code Online (Sandbox Code Playgroud)
这应该可以做到:
def rebuild_diffed(series, first_element_original):
cumsum = series.cumsum()
return cumsum.fillna(0) + first_element_original
Run Code Online (Sandbox Code Playgroud)
分步版本:
# making some data
a = pd.Series([2, 6, 4, 6, 2,])
print(a)
a_diff = a.diff()
print(a_diff)
# Rebuilding
a_diff_cumsum = a_diff.cumsum()
print(a_diff_cumsum)
rebuilt = a_diff_cumsum.fillna(0) + 2
print(rebuilt)
print(rebuilt == a)
Run Code Online (Sandbox Code Playgroud)