如何在Tensorflow中保存和恢复分区变量

yi *_*ang 8 python machine-learning deep-learning tensorflow

我有一个大矩阵.

我使用以下方式创建此变量作为分片数.

softmax_w = tf.get_variable("softmax_w", [hps.vocab_size, hps.projected_size],
                            partitioner=tf.fixed_size_partitioner(hps.num_shards, 0))
Run Code Online (Sandbox Code Playgroud)

创建日志:

model/softmax_w/part_0:0 (99184, 512) /cpu:0
model/softmax_w/part_1:0 (99184, 512) /cpu:0
model/softmax_w/part_2:0 (99184, 512) /cpu:0
model/softmax_w/part_3:0 (99184, 512) /cpu:0
model/softmax_w/part_4:0 (99184, 512) /cpu:0
model/softmax_w/part_5:0 (99184, 512) /cpu:0
model/softmax_w/part_6:0 (99183, 512) /cpu:0
model/softmax_w/part_7:0 (99183, 512) /cpu:0
Run Code Online (Sandbox Code Playgroud)

我可以训练并保存它的成功.但是当我尝试恢复模型时,我收到了这个错误:

W tensorflow/core/framework/op_kernel.cc:975] Not found: Key model/softmax_w/part_7 not found in checkpoint
W tensorflow/core/framework/op_kernel.cc:975] Not found: Key model/softmax_w/part_6 not found in checkpoint
W tensorflow/core/framework/op_kernel.cc:975] Not found: Key model/softmax_w/part_5 not found in checkpoint
W tensorflow/core/framework/op_kernel.cc:975] Not found: Key model/softmax_w/part_4 not found in checkpoint
W tensorflow/core/framework/op_kernel.cc:975] Not found: Key model/softmax_w/part_3 not found in checkpoint
W tensorflow/core/framework/op_kernel.cc:975] Not found: Key model/softmax_w/part_2 not found in checkpoint
W tensorflow/core/framework/op_kernel.cc:975] Not found: Key model/softmax_w/part_1 not found in checkpoint
W tensorflow/core/framework/op_kernel.cc:975] Not found: Key model/softmax_w/part_0 not found in checkpoint
W tensorflow/core/framework/op_kernel.cc:975] Not found: Key model/softmax_w/part_7 not found in checkpoint
Run Code Online (Sandbox Code Playgroud)

我发现tensorflow将变量保存为一部分.保存的参数只有一个softmax_w.不再是分区变量.

Max*_*xim 1

它发生在 Tensorflow 0.12 中,但在 1.3 中没有发生(截至 2017 年 10 月的最后一个版本)。这是一个GitHub 问题,由同一作者提交,现已修复。因此,如果您看到此错误,只需升级tensorflow即可。