Ada*_*ase 5 python machine-learning deep-learning tensorflow tpu
我正在尝试训练语言翻译模型,并且正在 Google Colab 上的Kaggle Notebookseq2seq中复制粘贴代码。该代码在 CPU 和 GPU 上运行良好,但在 TPU 上训练时出现错误。同样的问题已经在这里被问过。
这是我的代码:
strategy = tf.distribute.experimental.TPUStrategy(resolver)
with strategy.scope():
model = create_model()
model.compile(optimizer = 'rmsprop', loss = 'categorical_crossentropy')
model.fit_generator(generator = generate_batch(X_train, y_train, batch_size = batch_size),
steps_per_epoch = train_samples // batch_size,
epochs = epochs,
validation_data = generate_batch(X_test, y_test, batch_size = batch_size),
validation_steps = val_samples // batch_size)
Run Code Online (Sandbox Code Playgroud)
追溯:
Epoch 1/2
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-60-940fe0ee3c8b> in <module>()
3 epochs = epochs,
4 validation_data = generate_batch(X_test, y_test, batch_size = batch_size),
----> 5 validation_steps = val_samples // batch_size)
10 frames
/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/func_graph.py in wrapper(*args, **kwargs)
992 except Exception as e: # pylint:disable=broad-except
993 if hasattr(e, "ag_error_metadata"):
--> 994 raise e.ag_error_metadata.to_exception(e)
995 else:
996 raise
ValueError: in user code:
/usr/local/lib/python3.7/dist-packages/keras/engine/training.py:853 train_function *
return step_function(self, iterator)
/usr/local/lib/python3.7/dist-packages/keras/engine/training.py:842 step_function **
outputs = model.distribute_strategy.run(run_step, args=(data,))
...
ValueError: None values not supported.
Run Code Online (Sandbox Code Playgroud)
我无法找出错误,我认为错误是因为这个generate_batch函数:
X, y = lines['english_sentence'], lines['hindi_sentence']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 34)
def generate_batch(X = X_train, y = y_train, batch_size = 128):
while True:
for j in range(0, len(X), batch_size):
encoder_input_data = np.zeros((batch_size, max_length_src), dtype='float32')
decoder_input_data = np.zeros((batch_size, max_length_tar), dtype='float32')
decoder_target_data = np.zeros((batch_size, max_length_tar, num_decoder_tokens), dtype='float32')
for i, (input_text, target_text) in enumerate(zip(X[j:j + batch_size], y[j:j + batch_size])):
for t, word in enumerate(input_text.split()):
encoder_input_data[i, t] = input_token_index[word]
for t, word in enumerate(target_text.split()):
if t<len(target_text.split())-1:
decoder_input_data[i, t] = target_token_index[word]
if t>0:
decoder_target_data[i, t - 1, target_token_index[word]] = 1.
yield([encoder_input_data, decoder_input_data], decoder_target_data)
Run Code Online (Sandbox Code Playgroud)
我的 Colab 笔记本 -这里
Kaggle 数据集 -这里
TensorFlow 版本 -2.6
编辑- 请不要告诉我将 TensorFlow/Keras 版本降级为1.x. 我可以将其降级至TensorFlow 2.0, 2.1, 2.3但不能1.x。我不明白TensorFlow 1.x。另外,使用 3 年前的版本没有任何意义。
正如您提供的链接中引用的答案中所述,tensorflow.dataAPI 与 TPU 配合使用效果更好。为了适应您的情况,请尝试使用return而不是yield在generate_batch函数中:
def generate_batch(X = X_train, y = y_train, batch_size = 128):
...
return encoder_input_data, decoder_input_data, decoder_target_dat
encoder_input_data, decoder_input_data, decoder_target_data = generate_batch(X_train, y_train, batch_size=128)
Run Code Online (Sandbox Code Playgroud)
然后用于tensorflow.data构建数据:
from tensorflow.data import Dataset
encoder_input_data = Dataset.from_tensor_slices(encoder_input_data)
decoder_input_data = Dataset.from_tensor_slices(decoder_input_data)
decoder_target_data = Dataset.from_tensor_slices(decoder_target_data)
ds = Dataset.zip((encoder_input_data, decoder_input_data, decoder_target_data)).map(map_fn).batch(1024)
Run Code Online (Sandbox Code Playgroud)
其中map_fn定义为:
def map_fn(encoder_input ,decoder_input, decoder_target):
return (encoder_input ,decoder_input), decoder_target
Run Code Online (Sandbox Code Playgroud)
最后使用Model.fit而不是Model.fit_generator:
model.fit(x=ds, epochs=epochs)
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
793 次 |
| 最近记录: |