ValueError:不支持 None 值。代码在 CPU/GPU 上运行正常,但在 TPU 上运行不正常

Ada*_*ase 5 python machine-learning deep-learning tensorflow tpu

我正在尝试训练语言翻译模型,并且正在 Google Colab 上的Kaggle Notebookseq2seq中复制粘贴代码。该代码在 CPU 和 GPU 上运行良好,但在 TPU 上训练时出现错误。同样的问题已经在这里被问过。

这是我的代码:

    strategy = tf.distribute.experimental.TPUStrategy(resolver)
    
    with strategy.scope():
      model = create_model()
      model.compile(optimizer = 'rmsprop', loss = 'categorical_crossentropy')
    
    model.fit_generator(generator = generate_batch(X_train, y_train, batch_size = batch_size),
                        steps_per_epoch = train_samples // batch_size,
                        epochs = epochs,
                        validation_data = generate_batch(X_test, y_test, batch_size = batch_size),
                        validation_steps = val_samples // batch_size)
Run Code Online (Sandbox Code Playgroud)

追溯:

Epoch 1/2
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-60-940fe0ee3c8b> in <module>()
      3                     epochs = epochs,
      4                     validation_data = generate_batch(X_test, y_test, batch_size = batch_size),
----> 5                     validation_steps = val_samples // batch_size)

10 frames
/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/func_graph.py in wrapper(*args, **kwargs)
    992           except Exception as e:  # pylint:disable=broad-except
    993             if hasattr(e, "ag_error_metadata"):
--> 994               raise e.ag_error_metadata.to_exception(e)
    995             else:
    996               raise

ValueError: in user code:
    /usr/local/lib/python3.7/dist-packages/keras/engine/training.py:853 train_function  *
    return step_function(self, iterator)
    /usr/local/lib/python3.7/dist-packages/keras/engine/training.py:842 step_function  **
    outputs = model.distribute_strategy.run(run_step, args=(data,))
...
ValueError: None values not supported.
Run Code Online (Sandbox Code Playgroud)

我无法找出错误,我认为错误是因为这个generate_batch函数:

X, y = lines['english_sentence'], lines['hindi_sentence']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 34)

def generate_batch(X = X_train, y = y_train, batch_size = 128):
    while True:
        for j in range(0, len(X), batch_size):
 
            encoder_input_data = np.zeros((batch_size, max_length_src), dtype='float32')
            decoder_input_data = np.zeros((batch_size, max_length_tar), dtype='float32')
            decoder_target_data = np.zeros((batch_size, max_length_tar, num_decoder_tokens), dtype='float32')
            
            for i, (input_text, target_text) in enumerate(zip(X[j:j + batch_size], y[j:j + batch_size])):
                for t, word in enumerate(input_text.split()):
                    encoder_input_data[i, t] = input_token_index[word]
                for t, word in enumerate(target_text.split()):
                    if t<len(target_text.split())-1:
                        decoder_input_data[i, t] = target_token_index[word]
                    if t>0:

                        decoder_target_data[i, t - 1, target_token_index[word]] = 1.
            yield([encoder_input_data, decoder_input_data], decoder_target_data)
Run Code Online (Sandbox Code Playgroud)

我的 Colab 笔记本 -这里
Kaggle 数据集 -这里
TensorFlow 版本 -2.6

编辑- 请不要告诉我将 TensorFlow/Keras 版本降级为1.x. 我可以将其降级至TensorFlow 2.0, 2.1, 2.3但不能1.x。我不明白TensorFlow 1.x。另外,使用 3 年前的版本没有任何意义。

R. *_*ahy 1

正如您提供的链接中引用的答案中所述,tensorflow.dataAPI 与 TPU 配合使用效果更好。为了适应您的情况,请尝试使用return而不是yieldgenerate_batch函数中:

def generate_batch(X = X_train, y = y_train, batch_size = 128):
    ...
    return encoder_input_data, decoder_input_data, decoder_target_dat

encoder_input_data, decoder_input_data, decoder_target_data = generate_batch(X_train, y_train, batch_size=128)
Run Code Online (Sandbox Code Playgroud)

然后用于tensorflow.data构建数据:

from tensorflow.data import Dataset

encoder_input_data = Dataset.from_tensor_slices(encoder_input_data)
decoder_input_data = Dataset.from_tensor_slices(decoder_input_data)
decoder_target_data = Dataset.from_tensor_slices(decoder_target_data)
ds = Dataset.zip((encoder_input_data, decoder_input_data, decoder_target_data)).map(map_fn).batch(1024)
Run Code Online (Sandbox Code Playgroud)

其中map_fn定义为:

def map_fn(encoder_input ,decoder_input, decoder_target):
    return (encoder_input ,decoder_input), decoder_target
Run Code Online (Sandbox Code Playgroud)

最后使用Model.fit而不是Model.fit_generator

model.fit(x=ds, epochs=epochs)
Run Code Online (Sandbox Code Playgroud)