我想交叉验证我的时间序列数据并按时间戳年份拆分。
这是熊猫数据框中的以下数据:
mock_data
timestamp counts
'2015-01-01 03:45:14' 4
.
.
.
'2016-01-01 13:02:14' 12
.
.
.
'2017-01-01 09:56:54' 6
.
.
.
'2018-01-01 13:02:14' 8
.
.
.
'2019-01-01 11:39:40' 24
.
.
.
'2020-01-01 04:02:03' 30
mock_data.dtypes
timestamp object
counts int64
Run Code Online (Sandbox Code Playgroud)
查看TimeSeriesSplit()scikit-learn的功能,好像不能n_split按年份指定部分。是否有另一种方法可以创建连续的训练集,从而导致以下训练-测试拆分?
tscv = newTimeSeriesSplit(n_splits=5, by='year')
>>> print(tscv)
newTimeSeriesSplit(max_train_size=None, n_splits=5, by='year')
>>> for train_index, test_index in tscv.split(mock_data):
... print("TRAIN:", train_index, "TEST:", test_index)
... X_train, X_test = X[train_index], X[test_index]
... y_train, y_test = y[train_index], y[test_index] …Run Code Online (Sandbox Code Playgroud)