Den*_*se 5 python time-series lstm keras tensorflow
我是机器学习新手,正在使用 Keras 中的 LSTM 执行多元时间序列预测。我有一个每月时间序列数据集,其中包含 4 个输入变量(温度、降水、露水和 Wind_spreed)和 1 个输出变量(污染)。使用这些数据,我提出了一个预测问题,根据前几个月的天气状况和污染情况,我预测下个月的污染情况。下面是我的代码
X = df[['Temperature', 'Precipitation', 'Dew', 'Wind_speed' ,'Pollution (t_1)']].values
y = df['Pollution (t)'].values
y = y.reshape(-1,1)
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler(feature_range=(0, 1))
scaled = scaler.fit_transform(X)
#dataset has 359 samples in total
train_X, train_y = X[:278], y[:278]
test_X, test_y = X[278:], y[278:]
# reshape input to be 3D [samples, timesteps, features]
train_X = train_X.reshape((train_X.shape[0], 1, train_X.shape[1]))
test_X = test_X.reshape((test_X.shape[0], 1, test_X.shape[1]))
print(train_X.shape, train_y.shape, test_X.shape, test_y.shape)
model = Sequential()
model.add(LSTM(100, input_shape=(train_X.shape[1], train_X.shape[2])))
model.add(Dropout(0.2))
# model.add(LSTM(70))
# model.add(Dropout(0.3))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
history = model.fit(train_X, train_y, epochs=700, batch_size=70, validation_data=(test_X, test_y), verbose=2, shuffle=False)
# summarize history for loss
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper right')
plt.show()
Run Code Online (Sandbox Code Playgroud)
为了进行预测,我使用以下代码
from sklearn.metrics import mean_squared_error,r2_score
yhat = model.predict(test_X)
mse = mean_squared_error(test_y, yhat)
rmse = np.sqrt(mse)
r2 = r2_score(test_y, yhat)
print("test set performance")
print("--------------------")
print("MSE:",mse)
print("RMSE:",rmse)
print("R^2: ",r2)
fig, ax = plt.subplots(figsize=(10,5))
ax.plot(range(len(test_y)), test_y, '-b',label='Actual')
ax.plot(range(len(yhat)), yhat, 'r', label='Predicted')
plt.legend()
plt.show()
Run Code Online (Sandbox Code Playgroud)
运行这段代码我遇到了以下问题:
图表结果:
我的代码中是否做错了什么导致这些问题的原因?我是 python 新手,所以任何帮助回答上述两个问题的人都将不胜感激。
小智 2
首先,我认为输入“1”作为Timesteps值是不合适的,因为LSTM模型是处理时间序列或序列数据的模型。我认为以下数据挖掘脚本会很好用
def lstm_data(df,timestamps):
array_data=df.values
sc=MinMaxScaler()
array_data_=sc.fit_transform(array_data)
array=np.empty((0,array_data_.shape[1]))
range_=array_data_.shape[0]-(timestamps-1)
for t in range(range_):
array_data_p=array_data_[t:t+sequenth_length,:]
array=np.vstack((array,array_data_p))
array_=array.reshape(-1,timestamps, array.shape[1])
return array_
#timestamps depend on your objection, but not '1'
x_data=lstm_data(x, timestamps=4)
y_data=lstm_data(y, timestamps=4)
y_data=y_data.reshape(-1,1)
#Divide each data into train and test
#Input the divided data into your LSTM model
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
2092 次 |
最近记录: |