Tom*_*ees 6 python datetime time-series pandas statsmodels
由于statsmodels.tseries
模型需要具有给定频率的索引来进行预测,因此我需要我的数据具有非标准频率。
因此,我想创建一个新频率来分配给pandas.DateTimeIndex
。\n这是dekad
一年中有 36 个周期的频率。每个月三个。第一个总是在该月的 10 日,第二个是该月的 20 日,最后一个是该月的最后一天。
困难在于该月的最后一天:
\n然而,最终它是一个固定的频率(每月 3 次,每年 36 个周期)。
\n原因是statsmodels.tsa.holtwinters
模型需要具有给定频率的索引来进行预测。当我尝试运行holtwinters
预测时,我收到以下警告消息:
/home/tommy/miniconda3/envs/ml/lib/python3.8/site-packages/statsmodels/tsa/base/tsa_model.py:216: ValueWarning: A date index has been provided, but it has no associated frequency information and so will be ignored when e.g. forecasting.\n
Run Code Online (Sandbox Code Playgroud)\n/home/tommy/miniconda3/envs/ml/lib/python3.8/site-packages/statsmodels/tsa/base/tsa_model.py:216: ValueWarning: A date index has been provided, but it has no associated frequency information and so will be ignored when e.g. forecasting.\n
Run Code Online (Sandbox Code Playgroud)\n我希望能够为索引分配十进制频率
\nfrom pandas.tseries.offsets import MonthEnd\n\ndates = pd.date_range("2000-01-01", "2003-01-01")\n_dekads = [d for d in dates if d.day in [10, 20]]\n_month_ends = [d + MonthEnd(1) for d in dates if d.day == 10]\ndekads = sorted(np.concatenate([_dekads, _month_ends]))\n
Run Code Online (Sandbox Code Playgroud)\nOut[]:\n y\n2000-01-10 0.013236\n2000-01-20 0.430563\n2000-01-31 0.028183\n2000-02-10 0.050080\n2000-02-20 0.092100\n
Run Code Online (Sandbox Code Playgroud)\n我希望能够为该对象分配一个“dekad”频率。如何创建自己的十频率?
\ndf = pd.DataFrame({"y": np.random.random(len(dekads))}, index=dekads)\n\ndf.head()\n
Run Code Online (Sandbox Code Playgroud)\nOut[]:\n---------------------------------------------------------------------------\nKeyError Traceback (most recent call last)\npandas/_libs/tslibs/offsets.pyx in pandas._libs.tslibs.offsets._get_offset()\n\nKeyError: \'DEKAD\'\n\nThe above exception was the direct cause of the following exception:\n\nValueError Traceback (most recent call last)\npandas/_libs/tslibs/offsets.pyx in pandas._libs.tslibs.offsets.to_offset()\n\npandas/_libs/tslibs/offsets.pyx in pandas._libs.tslibs.offsets._get_offset()\n\nValueError: Invalid frequency: DEKAD\n\nThe above exception was the direct cause of the following exception:\n\nValueError Traceback (most recent call last)\n<ipython-input-155-aa7b4737fd5a> in <module>\n 7 \n 8 df = pd.DataFrame({"y": np.random.random(len(dekads))}, index=dekads)\n----> 9 df.index.freq = "dekad"\n\n~/miniconda3/envs/ml/lib/python3.8/site-packages/pandas/core/indexes/extension.py in fset(self, value)\n 62 \n 63 def fset(self, value):\n---> 64 setattr(self._data, name, value)\n 65 \n 66 fget.__name__ = name\n\n~/miniconda3/envs/ml/lib/python3.8/site-packages/pandas/core/arrays/datetimelike.py in freq(self, value)\n 1090 def freq(self, value):\n 1091 if value is not None:\n-> 1092 value = to_offset(value)\n 1093 self._validate_frequency(self, value)\n 1094 \n\npandas/_libs/tslibs/offsets.pyx in pandas._libs.tslibs.offsets.to_offset()\n\npandas/_libs/tslibs/offsets.pyx in pandas._libs.tslibs.offsets.to_offset()\n\nValueError: Invalid frequency: dekad\n\n
Run Code Online (Sandbox Code Playgroud)\n#\xc2\xa0如何freq
在 pandas 中创建一个新对象
Out[]:\n y\n2000-01-10 0.013236\n2000-01-20 0.430563\n2000-01-31 0.028183\n2000-02-10 0.050080\n2000-02-20 0.092100\n
Run Code Online (Sandbox Code Playgroud)\n\ndf.index.freq = "dekad"\n
Run Code Online (Sandbox Code Playgroud)\n\n这个预测显然很差,并且没有反映已知的季节性。我认为这是一个问题,因为没有为日期时间索引分配频率。
\n如果有其他方法可以实现这些目标,那么我将非常热衷于探索这些选择。我想创建一个新的频率来分配给pandas.DateTimeIndex
. 原因是statsmodels.tseries
模型需要具有给定频率的索引来进行预测。
您可以编辑源代码并添加规则来定义频率,但您可能不想这样做。
一个简单的实现是使用现有的自定义工作日频率:
pd.offsets.CustomBusinessDay(
holidays=my_holidays,
weekmask=my_weekdays,
)
Run Code Online (Sandbox Code Playgroud)
并将您的假期日历定义为每天(10、20 和is_month_end
偏移别名文档除外)
我猜你希望你的工作日是周一到周日(以确保你不会遗漏 10 号、20 号或is_month_end
)
start_date = '1/1/2023'
end_date = '31/12/2023'
full_calendar = pd.date_range(start=start_date, end=end_date)[source](https://pandas.pydata.org/docs/reference/api/pandas.date_range.html)
my_holidays = full_calendar[full_calendar.day != 10] #not the 10th of the month
my_holidays = my_holidays[my_holidays.day != 20] #not the 20th of the month
my_holidays = my_holidays[~my_holidays.is_month_end] #not the last day of the month
my_weekdays = = "Sun Mon Tue Wed Thu Fri Sat"
dekad = pd.offsets.CustomBusinessDay(
holidays=my_holidays,
weekmask=my_weekdays,
)
Run Code Online (Sandbox Code Playgroud)
现在您可以将它用作freq
my_dekad_dates = pd.date_range("2000-01-01", "2003-01-01", freq=dekad)
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
261 次 |
最近记录: |