使用Keras训练句子序列

Question

使用Keras训练句子序列

Mic*_*rge 5 python machine-learning neural-network keras

我正在做一个项目，我必须在神经网络中使用数字和文本数据的组合来预测下一个小时系统的可用性。我决定尝试使用Keras的合并层和两个网络（一个用于数字数据，一个用于文本），而不是尝试使用单独的神经网络并最终做一些奇怪/不清楚的事情（对我来说）以产生所需的输出。我的想法是，我以（batch_size，6hrs，num_features）的形式向模型提供了前6小时的一系列性能指标。除了提供给处理数字数据的网络的输入外，我还给第二个网络提供了另一个大小序列（batch_size，max_alerts_per_sequence，max_sentence长度）。

一个时间范围内的任何数字数据序列都可以具有与之关联的可变数量的事件（文本数据）。为简单起见，我最多只允许50个事件伴随一系列性能数据。每个事件均按单词进行哈希编码并填充。我尝试使用平坦层将输入形状从（50，30）减少到（1500），以便模型可以针对这些“序列”中的每个事件进行训练（以澄清：我通过50个带有30个编码元素的句子来传递模型每个效果数据序列）。

我的问题是：由于我需要NN来查看给定性能指标序列的所有事件，因此如何使基于文本的数据训练的NN能够基于句子序列？

我的模特：

#LSTM Module for performance metrics
input = Input(shape=(shape[1], shape[2]))
lstm1 = Bidirectional(LSTM(units=lstm_layer_count, activation='tanh', return_sequences=True, input_shape=shape))(input)
dropout1 = Dropout(rate=0.2)(lstm1)
lstm2 = Bidirectional(LSTM(units=lstm_layer_count, activation='tanh', return_sequences=False))(dropout1)
dropout2 = Dropout(rate=0.2)(lstm2)

#LSTM Module for text based data
tInput = Input(shape=(50, 30))
flatten = Flatten()(tInput)
embed = Embedding(input_dim=vocabsize + 1, output_dim= 50 * 30, input_length=30*50)(flatten)
magic = Bidirectional(LSTM(100))(embed)
tOut = Dense(1, activation='relu')(magic)

#Merge the layers
concat = Concatenate()([dropout2, tOut])
output = Dense(units=1, activation='sigmoid')(concat)

nn = keras.models.Model(inputs=[input, tInput], outputs = output)

opt = keras.optimizers.SGD(lr=0.1, momentum=0.8, nesterov=True, decay=0.001)
nn.compile(optimizer=opt, loss='mse', metrics=['accuracy', coeff_determination])

Run Code Online (Sandbox Code Playgroud)

Answer 1

ixe*_*ion 2

据我了解，您有一个最多 50 个事件的序列，您想要对其进行预测。这些事件附加了文本数据，可以将其视为另一个词嵌入序列。这是一篇关于类似架构的文章。

我会提出一个解决方案，其中涉及文本部分的 LSTM 和“真实”序列部分的一维卷积。每个 LSTM 层都与数值数据连接。这涉及 50 个 LSTM 层，即使使用共享权重，训练也可能非常耗时。也可以仅对文本部分使用卷积层，这样速度更快，但不会对长期依赖性进行建模。（我有经验，这些长期依赖关系在文本挖掘中通常并不那么重要）。

Text -> LSTM or 1DConv -> concat with numeric data -> 1DConv -> Output 这是一些示例代码，它展示了如何使用分片权重

numeric_input = Input(shape=(x_numeric_train.values.shape[1],), name='numeric_input')
nlp_seq = Input(shape=(number_of_messages ,seq_length,), name='nlp_input'+str(i))

# shared layers
emb = TimeDistributed(Embedding(input_dim=num_features, output_dim=embedding_size,
                input_length=seq_length, mask_zero=True,
                input_shape=(seq_length, )))(nlp_seq)    
x = TimeDistributed(Bidirectional(LSTM(32, dropout=0.3, recurrent_dropout=0.3, kernel_regularizer=regularizers.l2(0.01))))(emb)      

c1 = Conv1D(filter_size, kernel1, padding='valid', activation='relu', strides=1, kernel_regularizer=regularizers.l2(kernel_reg))(x)
p1 = GlobalMaxPooling1D()(c1)
c2 = Conv1D(filter_size, kernel2, padding='valid', activation='relu', strides=1, kernel_regularizer=regularizers.l2(kernel_reg))(x)
p2 = GlobalMaxPooling1D()(c2)
c3 = Conv1D(filter_size, kernel3, padding='valid', activation='relu', strides=1, kernel_regularizer=regularizers.l2(kernel_reg))(x)
p3 = GlobalMaxPooling1D()(c3)

x = concatenate([p1, p2, p3, numeric_input])    
x = Dense(1, activation='sigmoid')(x)        
model = Model(inputs=[nlp_seq, meta_input] , outputs=[x])
model.compile('adam', 'binary_crossentropy', metrics=['accuracy'])

Run Code Online (Sandbox Code Playgroud)

和培训：

model.fit([x_train, x_numeric_train], y_train)
# where x_train is a a array of num_samples * num_messages * seq_length

Run Code Online (Sandbox Code Playgroud)

像这样的复杂模型需要大量数据才能收敛。对于较少的数据，可以通过将事件聚合为仅具有一个序列来实现更简单的解决方案。例如，所有事件的文本数据可以被视为一个单独的文本（带有分隔符标记），而不是多个文本，而数字数据可以被求和、平均甚至组合成固定长度的列表。但这取决于你的数据。

由于我正在研究类似的事情，我稍后将用代码更新这些答案。

归档时间：	7 年，4 月前
查看次数：	376 次
最近记录：	6 年，2 月前