use*_*548 5 python time-series scikit-learn
我正在拟合一个时间序列。从这个意义上说,我正在尝试使用该函数进行交叉验证TimeSeriesSplit。我相信应用此函数的最简单方法是通过该cross_val_score函数,通过 cv 参数。
问题很简单,我传递简历参数的方式正确吗?我应该做split(scaled_train)还是应该使用split(X_train)或split(input_data)?或者,我应该以另一种方式交叉验证?
这是我正在编写的代码:
def fit_model1(data: pd.DataFrame):
df = data
scores_fit_model1 = []
for sizes in test_sizes:
# Generate Test Design
input_data = df.drop('next_count',axis=1)
output_data = df[['next_count']]
X_train, X_test, y_train, y_test = train_test_split(input_data, output_data, test_size=sizes, random_state=0, shuffle=False)
#scaling
scaler = MinMaxScaler()
scaled_train = scaler.fit_transform(X_train)
scaled_test = scaler.transform(X_test)
#Build Model
lr = LinearRegression()
lr.fit(scaled_train, y_train.values.ravel())
predictions = lr.predict(scaled_test)
#Cross Validation Definition
time_split = TimeSeriesSplit(n_splits=10)
#performance metrics
r2 = cross_val_score(lr, scaled_train, y_train.values.ravel(), cv=time_split.split(scaled_train), scoring = 'r2', n_jobs =1).mean()
scores_fit_model1.append(r2)
return scores_fit_model1
Run Code Online (Sandbox Code Playgroud)
它TimeSeriesSplit只是一个迭代器,它产生一个不断增长的连续折叠窗口。因此,您可以将其按原样传递给cv,也可以传递time_series_split(scaled_train),这相当于相同的事情:在与训练数据大小相同的数组中进行分割(作为cross_val_score第二个位置参数)。获取缩放后的数据还是原始数据并不重要TimeSeriesSplit,只要cross_val_score有缩放后的数据即可。
我还在您的代码中做了一些小的简化 - 在 之前进行缩放train_test_split,并使输出数据成为 Series (所以您不需要values.ravel):
def fit_model1(data: pd.DataFrame):
df = data
scores_fit_model1 = []
for sizes in test_sizes:
# Generate Test Design
input_data = df.drop('next_count',axis=1)
output_data = df['next_count']
scaler = MinMaxScaler()
scaled_input = scaler.fit_transform(input_data)
X_train, X_test, y_train, y_test = train_test_split(scaled_input, output_data, test_size=sizes, random_state=0, shuffle=False)
#Build Model
lr = LinearRegression()
lr.fit(X_train, y_train)
predictions = lr.predict(X_test)
#Cross Validation Definition
time_split = TimeSeriesSplit(n_splits=10)
#performance metrics
r2 = cross_val_score(lr, X_train, y_train, cv=time_split, scoring = 'r2', n_jobs =1).mean()
scores_fit_model1.append(r2)
return scores_fit_model1
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
601 次 |
| 最近记录: |