如何重塑文本数据以适合于Keras中的LSTM模型

Question

如何重塑文本数据以适合于Keras中的LSTM模型

sar*_*iii 6 python autoencoder lstm keras tensorflow

更新1：

我指的代码就是本书中的代码，您可以在这里找到。

唯一的事情是我不想embed_size在解码器部分中拥有。这就是为什么我认为根本不需要具有嵌入层的原因，因为如果我放入嵌入层，则需要embed_size在解码器部分中放置（如果Im错误，请更正我）。

总的来说，我试图在不使用嵌入层的情况下采用相同的代码，因为我需要vocab_size在解码器部分中具有。

我认为评论中提供的建议可能是正确的（using one_hot_encoding）我如何面对此错误：

当我做的时候one_hot_encoding：

tf.keras.backend.one_hot(indices=sent_wids, classes=vocab_size)

Run Code Online (Sandbox Code Playgroud)

我收到此错误：

in check_num_samples you should specify the + steps_name + argument ValueError: If your data is in the form of symbolic tensors, you should specify the steps_per_epoch argument (instead of the batch_size argument, because symbolic tensors are expected to produce batches of input data)

我准备数据的方式是这样的：

sent_lensis的形状，(87716, 200)我想以可以将其输入LSTM的方式重塑形状。这里200代表sequence_lenght，87716是我拥有的样本数。

以下是代码LSTM Autoencoder：

inputs = Input(shape=(SEQUENCE_LEN,VOCAB_SIZE), name="input")
encoded = Bidirectional(LSTM(LATENT_SIZE), merge_mode="sum", name="encoder_lstm")(inputs)
decoded = RepeatVector(SEQUENCE_LEN, name="repeater")(encoded)
decoded = LSTM(VOCAB_SIZE, return_sequences=True)(decoded)
autoencoder = Model(inputs, decoded)
autoencoder.compile(optimizer="sgd", loss='mse')
autoencoder.summary()
history = autoencoder.fit(Xtrain, Xtrain,batch_size=BATCH_SIZE, 
epochs=NUM_EPOCHS)

Run Code Online (Sandbox Code Playgroud)

我是否还需要做些额外的事情，如果不能，为什么我无法使它正常工作？

请让我知道我将解释的不清楚的部分。

谢谢你的帮助：）

Answer 1

sar*_*iii -1

正如评论中所说，事实证明我只需要这样做one_hot_encoding。

当我使用 tf.keras.backend 进行 one_hot 编码时，它会抛出我在问题中更新的错误。

然后我尝试to_categorical(sent_wids, num_classes=VOCAB_SIZE)并修复了它（但是面临memory error：D，这是不同的故事）！

我还应该提到的是，我尝试过，sparse_categorical_crossentropy但one_hot_encoding没有成功！

谢谢你的帮助：）

归档时间：	6 年，6 月前
查看次数：	274 次
最近记录：	6 年，6 月前