创建非标准 Pandas 频率(“dekads”= 每月 3 个周期)

Tom*_*ees 6 python datetime time-series pandas statsmodels

由于statsmodels.tseries模型需要具有给定频率的索引来进行预测,因此我需要我的数据具有非标准频率。

\n

因此,我想创建一个新频率来分配给pandas.DateTimeIndex。\n这是dekad一年中有 36 个周期的频率。每个月三个。第一个总是在该月的 10 日,第二个是该月的 20 日,最后一个是该月的最后一天。

\n

困难在于该月的最后一天:

\n
    \n
  1. 2 月的日期有所不同,具体取决于是否是闰年(28 日或 29 日)
  2. \n
  3. 取决于该月的天数(28、29、30、31)
  4. \n
\n

然而,最终它是一个固定的频率(每月 3 次,每年 36 个周期)。

\n

原因是statsmodels.tsa.holtwinters模型需要具有给定频率的索引来进行预测。当我尝试运行holtwinters预测时,我收到以下警告消息:

\n
/home/tommy/miniconda3/envs/ml/lib/python3.8/site-packages/statsmodels/tsa/base/tsa_model.py:216: ValueWarning: A date index has been provided, but it has no associated frequency information and so will be ignored when e.g. forecasting.\n
Run Code Online (Sandbox Code Playgroud)\n

这就是十倍时间步长的样子:

\n
/home/tommy/miniconda3/envs/ml/lib/python3.8/site-packages/statsmodels/tsa/base/tsa_model.py:216: ValueWarning: A date index has been provided, but it has no associated frequency information and so will be ignored when e.g. forecasting.\n
Run Code Online (Sandbox Code Playgroud)\n

我希望能够为索引分配十进制频率

\n
from pandas.tseries.offsets import MonthEnd\n\ndates = pd.date_range("2000-01-01", "2003-01-01")\n_dekads = [d for d in dates if d.day in [10, 20]]\n_month_ends = [d + MonthEnd(1) for d in dates if d.day == 10]\ndekads = sorted(np.concatenate([_dekads, _month_ends]))\n
Run Code Online (Sandbox Code Playgroud)\n
Out[]:\n            y\n2000-01-10  0.013236\n2000-01-20  0.430563\n2000-01-31  0.028183\n2000-02-10  0.050080\n2000-02-20  0.092100\n
Run Code Online (Sandbox Code Playgroud)\n

我希望能够为该对象分配一个“dekad”频率。如何创建自己的十频率?

\n
df = pd.DataFrame({"y": np.random.random(len(dekads))}, index=dekads)\n\ndf.head()\n
Run Code Online (Sandbox Code Playgroud)\n
Out[]:\n---------------------------------------------------------------------------\nKeyError                                  Traceback (most recent call last)\npandas/_libs/tslibs/offsets.pyx in pandas._libs.tslibs.offsets._get_offset()\n\nKeyError: \'DEKAD\'\n\nThe above exception was the direct cause of the following exception:\n\nValueError                                Traceback (most recent call last)\npandas/_libs/tslibs/offsets.pyx in pandas._libs.tslibs.offsets.to_offset()\n\npandas/_libs/tslibs/offsets.pyx in pandas._libs.tslibs.offsets._get_offset()\n\nValueError: Invalid frequency: DEKAD\n\nThe above exception was the direct cause of the following exception:\n\nValueError                                Traceback (most recent call last)\n<ipython-input-155-aa7b4737fd5a> in <module>\n      7 \n      8 df = pd.DataFrame({"y": np.random.random(len(dekads))}, index=dekads)\n----> 9 df.index.freq = "dekad"\n\n~/miniconda3/envs/ml/lib/python3.8/site-packages/pandas/core/indexes/extension.py in fset(self, value)\n     62 \n     63             def fset(self, value):\n---> 64                 setattr(self._data, name, value)\n     65 \n     66             fget.__name__ = name\n\n~/miniconda3/envs/ml/lib/python3.8/site-packages/pandas/core/arrays/datetimelike.py in freq(self, value)\n   1090     def freq(self, value):\n   1091         if value is not None:\n-> 1092             value = to_offset(value)\n   1093             self._validate_frequency(self, value)\n   1094 \n\npandas/_libs/tslibs/offsets.pyx in pandas._libs.tslibs.offsets.to_offset()\n\npandas/_libs/tslibs/offsets.pyx in pandas._libs.tslibs.offsets.to_offset()\n\nValueError: Invalid frequency: dekad\n\n
Run Code Online (Sandbox Code Playgroud)\n

#\xc2\xa0如何freq在 pandas 中创建一个新对象

\n

本次练习的目的:

\n
Out[]:\n            y\n2000-01-10  0.013236\n2000-01-20  0.430563\n2000-01-31  0.028183\n2000-02-10  0.050080\n2000-02-20  0.092100\n
Run Code Online (Sandbox Code Playgroud)\n

Dekad 数据的训练测试分割

\n
df.index.freq = "dekad"\n
Run Code Online (Sandbox Code Playgroud)\n

模型拟合和预测(基于测试数据)

\n

这个预测显然很差,并且没有反映已知的季节性。我认为这是一个问题,因为没有为日期时间索引分配频率。

\n

如果有其他方法可以实现这些目标,那么我将非常热衷于探索这些选择。我想创建一个新的频率来分配给pandas.DateTimeIndex. 原因是statsmodels.tseries模型需要具有给定频率的索引来进行预测。

\n

Gio*_*Gio 3

您可以编辑源代码并添加规则来定义频率,但您可能不想这样做。

一个简单的实现是使用现有的自定义工作日频率

pd.offsets.CustomBusinessDay(
    holidays=my_holidays,
    weekmask=my_weekdays,
)
Run Code Online (Sandbox Code Playgroud)

并将您的假期日历定义为每天(10、20 和is_month_end偏移别名文档除外)

我猜你希望你的工作日是周一到周日(以确保你不会遗漏 10 号、20 号或is_month_end

start_date = '1/1/2023'
end_date = '31/12/2023'
full_calendar = pd.date_range(start=start_date, end=end_date)[source](https://pandas.pydata.org/docs/reference/api/pandas.date_range.html)

my_holidays = full_calendar[full_calendar.day != 10] #not the 10th of the month
my_holidays = my_holidays[my_holidays.day != 20] #not the 20th of the month
my_holidays = my_holidays[~my_holidays.is_month_end] #not the last day of the month
my_weekdays = = "Sun Mon Tue Wed Thu Fri Sat"
dekad = pd.offsets.CustomBusinessDay(
    holidays=my_holidays,
    weekmask=my_weekdays,
)
Run Code Online (Sandbox Code Playgroud)

现在您可以将它用作freq

my_dekad_dates = pd.date_range("2000-01-01", "2003-01-01", freq=dekad)
Run Code Online (Sandbox Code Playgroud)