将估算器转换为TPUEstimator

Question

将估算器转换为TPUEstimator

atp*_*atp 3 python machine-learning keras tensorflow tensorflow-estimator

是否可以在不花费大量精力重写其功能的情况下将TensorFlow 转换Estimator为TPUEstimatorins？我有一个Estimator格式良好的模型，可以在CPU上很好地工作，但是我不知道一种TPUEstimator无需重写model_fnand 的便捷方法input_fn。

这需要手动进行大量工作的原因是，我正在使用Keras创建模型，然后使用以下帮助函数创建了Estimator：

   my_keras_model.compile(
                optimizer=tf.keras.optimizers.SGD(lr=0.0001, momentum=0.9),
                loss='categorical_crossentropy',
                metric='accuracy')
   estimator = tf.keras.estimator.model_to_estimator(keras_model=my_keras_model)

Run Code Online (Sandbox Code Playgroud)

如果我可以做类似的事情estimator.to_TPU_estimator()或那样的事情，那就太好了–也许有人知道解决方案？

Answer 1

Max*_*xim 6

不可能有这样的功能，因为model_fn两个估算器的规格不同。一些差异非常深刻，例如这一点（来自TPU教程）：

在云端TPU上进行训练时，您必须将优化器包装在中tf.contrib.tpu.CrossShardOptimizer，该优化器使用allreduce来汇总梯度并将结果广播到每个分片（每个TPU核心）。

这意味着修补keras优化器的内部并更新操作。

推荐的方法是model_fn为GPU和TPU模型使用不同的包装，这似乎是最快的方法。在您的情况下，这意味着model_to_estimator为TPU估算器重写keras 函数。

第一个也是最简单的近似值是：

def model_to_estimator(keras_model=None,
                       keras_model_path=None,
                       custom_objects=None,
                       model_dir=None,
                       config=None):
  keras_weights = keras_model.get_weights()
  keras_model_fn = _create_keras_tpu_model_fn(keras_model, custom_objects)
  est = tf.contrib.tpu.TPUEstimator(keras_model_fn, model_dir=model_dir, config=config)
  _save_first_checkpoint(keras_model, est, custom_objects, keras_weights)
  return est

Run Code Online (Sandbox Code Playgroud)

在这里，_save_first_checkpointcall实际上是可选的，但是如果您想保留它，请从中导入此函数tensorflow.python.keras._impl.keras.estimator。

真正的工作发生在_create_keras_tpu_model_fn功能上，它取代了_create_keras_model_fn。更改为：

内部tensorflow优化器必须CrossShardOptimizer如前所述包装，并且
内部函数必须返回TPUEstimatorSpec。

也可能还需要打几行，但对我来说似乎没问题。完整的版本如下：

from tensorflow.python.keras._impl.keras.estimator import _save_first_checkpoint, _clone_and_build_model

def model_to_estimator(keras_model=None,
                       keras_model_path=None,
                       custom_objects=None,
                       model_dir=None,
                       config=None):
  keras_weights = keras_model.get_weights()
  keras_model_fn = _create_keras_tpu_model_fn(keras_model, custom_objects)
  est = tf.contrib.tpu.TPUEstimator(keras_model_fn, model_dir=model_dir, config=config)
  _save_first_checkpoint(keras_model, est, custom_objects, keras_weights)
  return est


def _create_keras_tpu_model_fn(keras_model, custom_objects=None):

  def model_fn(features, labels, mode):
    """model_fn for keras Estimator."""
    model = _clone_and_build_model(mode, keras_model, custom_objects, features,
                                   labels)
    predictions = dict(zip(model.output_names, model.outputs))

    loss = None
    train_op = None
    eval_metric_ops = None

    # Set loss and metric only during train and evaluate.
    if mode is not tf.estimator.ModeKeys.PREDICT:
      model.optimizer.optimizer = tf.contrib.tpu.CrossShardOptimizer(model.optimizer.optimizer)

      model._make_train_function()  # pylint: disable=protected-access
      loss = model.total_loss

      if model.metrics:
        eval_metric_ops = {}
        # When each metric maps to an output
        if isinstance(model.metrics, dict):
          for i, output_name in enumerate(model.metrics.keys()):
            metric_name = model.metrics[output_name]
            if callable(metric_name):
              metric_name = metric_name.__name__
            # When some outputs use the same metric
            if list(model.metrics.values()).count(metric_name) > 1:
              metric_name += '_' + output_name
            eval_metric_ops[metric_name] = tf.metrics.mean(
                model.metrics_tensors[i - len(model.metrics)])
        else:
          for i, metric_name in enumerate(model.metrics):
            if callable(metric_name):
              metric_name = metric_name.__name__
            eval_metric_ops[metric_name] = tf.metrics.mean(
                model.metrics_tensors[i])

    if mode is tf.estimator.ModeKeys.TRAIN:
      train_op = model.train_function.updates_op

    return tf.contrib.tpu.TPUEstimatorSpec(
        mode=mode,
        predictions=predictions,
        loss=loss,
        train_op=train_op,
        eval_metric_ops=eval_metric_ops)

  return model_fn

Run Code Online (Sandbox Code Playgroud)

归档时间：	7 年，6 月前
查看次数：	1334 次
最近记录：	7 年，6 月前