fro*_*ael 28 python prediction keras recurrent-neural-network
我一直在开发前馈神经网络(FNNS)和复发性神经网络(RNNs)在Keras与形状的结构化数据[instances, time, features]
,以及FNNS和RNNs的表现一直不变(除了RNNs需要更多的计算时间).
我还模拟了表格数据(下面的代码),我希望RNN的表现优于FNN,因为系列中的下一个值取决于系列中的先前值; 但是,两种架构都能正确预测.
对于NLP数据,我看到RNN优于FNN,但没有表格数据.一般来说,何时会期望RNN的表格数据优于FNN?具体来说,有人可以使用表格数据发布模拟代码,证明RNN优于FNN吗?
谢谢!如果我的模拟代码不适合我的问题,请调整它或分享一个更理想的模拟代码!
from keras import models
from keras import layers
from keras.layers import Dense, LSTM
import numpy as np
import matplotlib.pyplot as plt
Run Code Online (Sandbox Code Playgroud)
在10个时间步骤中模拟了两个特征,其中第二个特征的值取决于先前时间步骤中两个特征的值.
## Simulate data.
np.random.seed(20180825)
X = np.random.randint(50, 70, size = (11000, 1)) / 100
X = np.concatenate((X, X), axis = 1)
for i in range(10):
X_next = np.random.randint(50, 70, size = (11000, 1)) / 100
X = np.concatenate((X, X_next, (0.50 * X[:, -1].reshape(len(X), 1))
+ (0.50 * X[:, -2].reshape(len(X), 1))), axis = 1)
print(X.shape)
## Training and validation data.
split = 10000
Y_train = X[:split, -1:].reshape(split, 1)
Y_valid = X[split:, -1:].reshape(len(X) - split, 1)
X_train = X[:split, :-2]
X_valid = X[split:, :-2]
print(X_train.shape)
print(Y_train.shape)
print(X_valid.shape)
print(Y_valid.shape)
Run Code Online (Sandbox Code Playgroud)
FNN:
## FNN model.
# Define model.
network_fnn = models.Sequential()
network_fnn.add(layers.Dense(64, activation = 'relu', input_shape = (X_train.shape[1],)))
network_fnn.add(Dense(1, activation = None))
# Compile model.
network_fnn.compile(optimizer = 'adam', loss = 'mean_squared_error')
# Fit model.
history_fnn = network_fnn.fit(X_train, Y_train, epochs = 10, batch_size = 32, verbose = False,
validation_data = (X_valid, Y_valid))
plt.scatter(Y_train, network_fnn.predict(X_train), alpha = 0.1)
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.show()
plt.scatter(Y_valid, network_fnn.predict(X_valid), alpha = 0.1)
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.show()
Run Code Online (Sandbox Code Playgroud)
LSTM:
## LSTM model.
X_lstm_train = X_train.reshape(X_train.shape[0], X_train.shape[1] // 2, 2)
X_lstm_valid = X_valid.reshape(X_valid.shape[0], X_valid.shape[1] // 2, 2)
# Define model.
network_lstm = models.Sequential()
network_lstm.add(layers.LSTM(64, activation = 'relu', input_shape = (X_lstm_train.shape[1], 2)))
network_lstm.add(layers.Dense(1, activation = None))
# Compile model.
network_lstm.compile(optimizer = 'adam', loss = 'mean_squared_error')
# Fit model.
history_lstm = network_lstm.fit(X_lstm_train, Y_train, epochs = 10, batch_size = 32, verbose = False,
validation_data = (X_lstm_valid, Y_valid))
plt.scatter(Y_train, network_lstm.predict(X_lstm_train), alpha = 0.1)
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.show()
plt.scatter(Y_valid, network_lstm.predict(X_lstm_valid), alpha = 0.1)
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.show()
Run Code Online (Sandbox Code Playgroud)
实际上即使在NLP中,您也会看到RNN和CNN通常具有竞争力.这是2017年的一篇评论文章,更详细地展示了这一点.从理论上讲,RNN可以更好地处理语言的完整复杂性和顺序性,但在实践中,更大的障碍通常是正确地训练网络,而RNN是挑剔的.
可能有机会工作的另一个问题是查看平衡括号问题(在字符串或括号中只括括号以及其他分心符号)的问题.这需要按顺序处理输入并跟踪某些状态,使用LSTM然后使用FFN可能更容易学习.
更新:某些看起来顺序的数据实际上可能不必按顺序处理.例如,即使您提供要添加的数字序列,因为加法是可交换的,FFN也会与RNN一样好.对于许多健康问题也是如此,其中主导信息不是连续性的.假设每年测量患者的吸烟习惯.从行为的角度来看,这一轨迹非常重要,但如果您预测患者是否患上肺癌,预测将仅由患者吸烟的年数决定(可能仅限于FFN的最近10年).
因此,您希望使玩具问题更加复杂,并要求考虑数据的排序.也许某种模拟时间序列,你想要预测数据是否有峰值,但你不关心峰值的相对性质的绝对值.
UPDATE2
我修改了你的代码,以显示RNN表现更好的情况.诀窍是使用更复杂的条件逻辑,这种逻辑在LSTM中比FFN更自然地建模.代码如下.对于8列,我们看到FFN在1分钟内进行训练并达到验证损失6.3.LSTM需要花费3倍的时间进行训练,但最终的验证损失比1.06低6倍.
随着我们增加列数,LSTM具有越来越大的优势,特别是如果我们添加了更复杂的条件.对于16列,FFN的验证损失是19(并且您可以更清楚地看到训练曲线,因为模型不是能够立即适应数据).相比之下,LSTM的训练时间要长11倍,但验证损失为0.31,比FFN小30倍!您可以使用更大的矩阵来查看此趋势将延伸多远.
from keras import models
from keras import layers
from keras.layers import Dense, LSTM
import numpy as np
import matplotlib.pyplot as plt
import matplotlib
import time
matplotlib.use('Agg')
np.random.seed(20180908)
rows = 20500
cols = 10
# Randomly generate Z
Z = 100*np.random.uniform(0.05, 1.0, size = (rows, cols))
larger = np.max(Z[:, :cols/2], axis=1).reshape((rows, 1))
larger2 = np.max(Z[:, cols/2:], axis=1).reshape((rows, 1))
smaller = np.min((larger, larger2), axis=0)
# Z is now the max of the first half of the array.
Z = np.append(Z, larger, axis=1)
# Z is now the min of the max of each half of the array.
# Z = np.append(Z, smaller, axis=1)
# Combine and shuffle.
#Z = np.concatenate((Z_sum, Z_avg), axis = 0)
np.random.shuffle(Z)
## Training and validation data.
split = 10000
X_train = Z[:split, :-1]
X_valid = Z[split:, :-1]
Y_train = Z[:split, -1:].reshape(split, 1)
Y_valid = Z[split:, -1:].reshape(rows - split, 1)
print(X_train.shape)
print(Y_train.shape)
print(X_valid.shape)
print(Y_valid.shape)
print("Now setting up the FNN")
## FNN model.
tick = time.time()
# Define model.
network_fnn = models.Sequential()
network_fnn.add(layers.Dense(32, activation = 'relu', input_shape = (X_train.shape[1],)))
network_fnn.add(Dense(1, activation = None))
# Compile model.
network_fnn.compile(optimizer = 'adam', loss = 'mean_squared_error')
# Fit model.
history_fnn = network_fnn.fit(X_train, Y_train, epochs = 500, batch_size = 128, verbose = False,
validation_data = (X_valid, Y_valid))
tock = time.time()
print()
print(str('%.2f' % ((tock - tick) / 60)) + ' minutes.')
print("Now evaluating the FNN")
loss_fnn = history_fnn.history['loss']
val_loss_fnn = history_fnn.history['val_loss']
epochs_fnn = range(1, len(loss_fnn) + 1)
print("train loss: ", loss_fnn[-1])
print("validation loss: ", val_loss_fnn[-1])
plt.plot(epochs_fnn, loss_fnn, 'black', label = 'Training Loss')
plt.plot(epochs_fnn, val_loss_fnn, 'red', label = 'Validation Loss')
plt.title('FNN: Training and Validation Loss')
plt.legend()
plt.show()
plt.scatter(Y_train, network_fnn.predict(X_train), alpha = 0.1)
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.title('training points')
plt.show()
plt.scatter(Y_valid, network_fnn.predict(X_valid), alpha = 0.1)
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.title('valid points')
plt.show()
print("LSTM")
## LSTM model.
X_lstm_train = X_train.reshape(X_train.shape[0], X_train.shape[1], 1)
X_lstm_valid = X_valid.reshape(X_valid.shape[0], X_valid.shape[1], 1)
tick = time.time()
# Define model.
network_lstm = models.Sequential()
network_lstm.add(layers.LSTM(32, activation = 'relu', input_shape = (X_lstm_train.shape[1], 1)))
network_lstm.add(layers.Dense(1, activation = None))
# Compile model.
network_lstm.compile(optimizer = 'adam', loss = 'mean_squared_error')
# Fit model.
history_lstm = network_lstm.fit(X_lstm_train, Y_train, epochs = 500, batch_size = 128, verbose = False,
validation_data = (X_lstm_valid, Y_valid))
tock = time.time()
print()
print(str('%.2f' % ((tock - tick) / 60)) + ' minutes.')
print("now eval")
loss_lstm = history_lstm.history['loss']
val_loss_lstm = history_lstm.history['val_loss']
epochs_lstm = range(1, len(loss_lstm) + 1)
print("train loss: ", loss_lstm[-1])
print("validation loss: ", val_loss_lstm[-1])
plt.plot(epochs_lstm, loss_lstm, 'black', label = 'Training Loss')
plt.plot(epochs_lstm, val_loss_lstm, 'red', label = 'Validation Loss')
plt.title('LSTM: Training and Validation Loss')
plt.legend()
plt.show()
plt.scatter(Y_train, network_lstm.predict(X_lstm_train), alpha = 0.1)
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.title('training')
plt.show()
plt.scatter(Y_valid, network_lstm.predict(X_lstm_valid), alpha = 0.1)
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.title("validation")
plt.show()
Run Code Online (Sandbox Code Playgroud)