src*_*nas 11 python tensorflow
正如这个问题所述:
tensorflow文档未提供如何在评估集上执行模型的定期评估的任何示例
接受的答案建议使用Experiment(根据本README不推荐使用).
我在网上找到的所有内容都指向了使用train_and_evaluate方法.但是,我仍然没有看到如何在两个过程之间切换(训练和评估).我尝试过以下方法:
estimator = tf.estimator.Estimator(
model_fn=model_fn,
params=hparams,
model_dir=model_dir,
config = tf.estimator.RunConfig(
save_checkpoints_steps = 2000,
save_summary_steps = 100,
keep_checkpoint_max=5
)
)
train_input_fn = lambda: input_fn(
train_file, #a .tfrecords file
train=True,
batch_size=70,
num_epochs=100
)
eval_input_fn = lambda: input_fn(
val_file, # another .tfrecords file
train=False,
batch_size=70,
num_epochs=1
)
train_spec = tf.estimator.TrainSpec(
train_input_fn,
max_steps=125
)
eval_spec = tf.estimator.EvalSpec(
eval_input_fn,
steps=30,
name='validation',
start_delay_secs=150,
throttle_secs=200
)
tf.logging.info("start experiment...")
tf.estimator.train_and_evaluate(
estimator,
train_spec,
eval_spec
)
Run Code Online (Sandbox Code Playgroud)
以下是我认为我的代码应该做的事情:
使用70的批量训练模型100个时期; 每2000批次保存检查点; 每100批保存摘要; 最多保留5个检查站; 在训练集上150批次之后,使用30批验证数据计算验证错误
但是,我得到以下日志:
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 1 into /output/model.ckpt.
INFO:tensorflow:loss = 39.55082, step = 1
INFO:tensorflow:global_step/sec: 178.622
INFO:tensorflow:loss = 1.0455043, step = 101 (0.560 sec)
INFO:tensorflow:Saving checkpoints for 150 into /output/model.ckpt.
INFO:tensorflow:Loss for final step: 0.8327793.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2018-04-02-22:49:15
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /projects/MNIST-GCP/output/model.ckpt-150
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Evaluation [3/30]
INFO:tensorflow:Evaluation [6/30]
INFO:tensorflow:Evaluation [9/30]
INFO:tensorflow:Evaluation [12/30]
INFO:tensorflow:Evaluation [15/30]
INFO:tensorflow:Evaluation [18/30]
INFO:tensorflow:Evaluation [21/30]
INFO:tensorflow:Evaluation [24/30]
INFO:tensorflow:Evaluation [27/30]
INFO:tensorflow:Evaluation [30/30]
INFO:tensorflow:Finished evaluation at 2018-04-02-22:49:15
INFO:tensorflow:Saving dict for global step 150: accuracy = 0.8552381, global_step =150, loss = 0.95031387
Run Code Online (Sandbox Code Playgroud)
从日志开始,似乎训练在第一个评估步骤后停止.我在文档中遗漏了什么?你能解释一下我应该如何实现我认为我的代码在做什么?
附加信息我使用MNIST数据集运行一切,在训练集中有50,000个图像,所以(我认为)模型应该运行*num_epochs*50,000 /batch_size≃7,000步*
我真诚地感谢你的帮助!
编辑:运行实验后,我意识到max_steps控制整个训练过程的步骤数,而不仅仅是计算测试集上的度量标准之前的步骤数.阅读tf.estimator.Estimator.train,我看到它有一个步骤参数,它以增量方式工作,并以max_steps为界; 但是,tf.estimator.TrainSpec没有steps参数,这意味着我无法控制在验证集上计算度量标准之前要采取的步骤数.
小智 2
事实上,每 200 秒或当您的训练完成时,估计器就会从训练阶段切换到评估阶段。
但是,我们可以在您的代码中看到您能够完成评估开始前的 125 步,这意味着您的训练已经完成。max_steps 是停止前训练重复的次数,与纪元数有任何联系(因为它没有在 tf.estimator.train_and_evaluate 中使用)。在训练期间,您的评估指标将在每个throttle_secs(此处= 200)发生。
关于指标,您可以使用以下命令将它们添加到模型中:
predict = tf.nn.softmax(logits, name="softmax_tensor")
classes = tf.cast(tf.argmax(predict, 1), tf.uint8)
def conv_model_eval_metrics(classes, labels, mode):
if mode == tf.estimator.ModeKeys.TRAIN or mode == tf.estimator.ModeKeys.EVAL:
return {
'accuracy': tf.metrics.accuracy(classes, labels),
'precision': tf.metrics.precision(classes, labels),
'recall': tf.metrics.recall(classes, labels),
}
else:
return None
eval_metrics = conv_model_eval_metrics(classes, labels, mode)
with tf.variable_scope("performance_metrics"):
#Accuracy is the most intuitive performance measure and it is simply a
#ratio of correctly predicted observation to the total observations.
tf.summary.scalar('accuracy', eval_metrics['accuracy'][1])
#How many selected items are relevant
#Precision is the ratio of correctly predicted positive observations to
#the total predicted positive observations.
tf.summary.scalar('precision', eval_metrics['precision'][1])
#How many relevant items are selected
#Recall is the ratio of correctly predicted positive observations to
#the all observations in actual class
tf.summary.scalar('recall', eval_metrics['recall'][1])
Run Code Online (Sandbox Code Playgroud)
在训练和评估期间,在张量板上跟踪精确度、召回率和准确度效果非常好。
PS:抱歉,这是我的第一个回答,所以读起来很恶心^^
| 归档时间: |
|
| 查看次数: |
6053 次 |
| 最近记录: |