Tensorflow:如何将转换层权重复制到另一个变量以用于强化学习?

and*_*rew 2 python reinforcement-learning conv-neural-network tensorflow

我不确定Tensorflow是否可行,我担心我可能不得不切换到PyTorch。

基本上,我有这层:

self.policy_conv1 = tf.layers.conv2d(inputs=self.policy_s, filters=16, kernel_size=(8,8),strides=(4,4), padding = 'valid',activation=tf.nn.relu, kernel_initializer=tf.glorot_uniform_initializer, bias_initializer = tf.glorot_uniform_initializer)
Run Code Online (Sandbox Code Playgroud)

我正在尝试每训练约100次迭代将其复制到另一层:

self.eval_conv1 = tf.layers.conv2d(inputs=self.s, filters=16, kernel_size=(8,8),strides=(4,4), padding = 'valid', activation=tf.nn.relu, kernel_initializer=tf.glorot_uniform_initializer, bias_initializer = tf.glorot_uniform_initializer)
Run Code Online (Sandbox Code Playgroud)

tf.assign 似乎不是正确的工具,并且以下内容似乎无效:

self.policy_conv1 = tf.stop_gradient(tf.identity(self.eval_conv1))
Run Code Online (Sandbox Code Playgroud)

本质上,我希望将其遍历eval conv层复制到policy conv层,并且不要在每次图形运行一个变量或另一个变量时将它们绑在一起(这与上面的身份代码段一起发生)。如果有人可以指出所需的代码,我将不胜感激。

squ*_*ick 6

import numpy as np
import tensorflow as tf

# I'm using placeholders, but it'll work for other inputs as well
ph1 = tf.placeholder(tf.float32, [None, 32, 32, 3])
ph2 = tf.placeholder(tf.float32, [None, 32, 32, 3])

l1 = tf.layers.conv2d(inputs=ph1, filters=16, kernel_size=(8,8),strides=(4,4), padding = 'valid',activation=tf.nn.relu, kernel_initializer=tf.glorot_uniform_initializer, bias_initializer = tf.glorot_uniform_initializer, name="layer_1")
l2 = tf.layers.conv2d(inputs=ph2, filters=16, kernel_size=(8,8),strides=(4,4), padding = 'valid',activation=tf.nn.relu, kernel_initializer=tf.glorot_uniform_initializer, bias_initializer = tf.glorot_uniform_initializer, name="layer_2")

sess = tf.Session()
sess.run(tf.global_variables_initializer())

w1 = tf.get_default_graph().get_tensor_by_name("layer_1/kernel:0")
w2 = tf.get_default_graph().get_tensor_by_name("layer_2/kernel:0")

w1_r = sess.run(w1)
w2_r = sess.run(w2)
print(np.sum(w1_r - w2_r)) # non-zero

sess.run(tf.assign(w2, w1))
w1_r = sess.run(w1)
w2_r = sess.run(w2)
print(np.sum(w1_r - w2_r)) # 0

w1 = w1 * 2 + 1
w1_r = sess.run(w1)
w2_r = sess.run(w2)
print(np.sum(w1_r - w2_r)) # non-zero
Run Code Online (Sandbox Code Playgroud)

layer_1/bias:0 应该为获得偏差项而努力。

更新:

我找到了一种更简单的方法:

update_weights = [tf.assign(new, old) for (new, old) in 
   zip(tf.trainable_variables('new_scope'), tf.trainable_vars('old_scope'))]
Run Code Online (Sandbox Code Playgroud)

进行sess.run操作update_weights应将权重从一个网络复制到另一个网络。只需记住在单独的名称范围下进行构建即可。