and*_*rew 2 python reinforcement-learning conv-neural-network tensorflow
我不确定Tensorflow是否可行,我担心我可能不得不切换到PyTorch。
基本上,我有这层:
self.policy_conv1 = tf.layers.conv2d(inputs=self.policy_s, filters=16, kernel_size=(8,8),strides=(4,4), padding = 'valid',activation=tf.nn.relu, kernel_initializer=tf.glorot_uniform_initializer, bias_initializer = tf.glorot_uniform_initializer)
Run Code Online (Sandbox Code Playgroud)
我正在尝试每训练约100次迭代将其复制到另一层:
self.eval_conv1 = tf.layers.conv2d(inputs=self.s, filters=16, kernel_size=(8,8),strides=(4,4), padding = 'valid', activation=tf.nn.relu, kernel_initializer=tf.glorot_uniform_initializer, bias_initializer = tf.glorot_uniform_initializer)
Run Code Online (Sandbox Code Playgroud)
tf.assign 似乎不是正确的工具,并且以下内容似乎无效:
self.policy_conv1 = tf.stop_gradient(tf.identity(self.eval_conv1))
Run Code Online (Sandbox Code Playgroud)
本质上,我希望将其遍历eval conv层复制到policy conv层,并且不要在每次图形运行一个变量或另一个变量时将它们绑在一起(这与上面的身份代码段一起发生)。如果有人可以指出所需的代码,我将不胜感激。
import numpy as np
import tensorflow as tf
# I'm using placeholders, but it'll work for other inputs as well
ph1 = tf.placeholder(tf.float32, [None, 32, 32, 3])
ph2 = tf.placeholder(tf.float32, [None, 32, 32, 3])
l1 = tf.layers.conv2d(inputs=ph1, filters=16, kernel_size=(8,8),strides=(4,4), padding = 'valid',activation=tf.nn.relu, kernel_initializer=tf.glorot_uniform_initializer, bias_initializer = tf.glorot_uniform_initializer, name="layer_1")
l2 = tf.layers.conv2d(inputs=ph2, filters=16, kernel_size=(8,8),strides=(4,4), padding = 'valid',activation=tf.nn.relu, kernel_initializer=tf.glorot_uniform_initializer, bias_initializer = tf.glorot_uniform_initializer, name="layer_2")
sess = tf.Session()
sess.run(tf.global_variables_initializer())
w1 = tf.get_default_graph().get_tensor_by_name("layer_1/kernel:0")
w2 = tf.get_default_graph().get_tensor_by_name("layer_2/kernel:0")
w1_r = sess.run(w1)
w2_r = sess.run(w2)
print(np.sum(w1_r - w2_r)) # non-zero
sess.run(tf.assign(w2, w1))
w1_r = sess.run(w1)
w2_r = sess.run(w2)
print(np.sum(w1_r - w2_r)) # 0
w1 = w1 * 2 + 1
w1_r = sess.run(w1)
w2_r = sess.run(w2)
print(np.sum(w1_r - w2_r)) # non-zero
Run Code Online (Sandbox Code Playgroud)
layer_1/bias:0 应该为获得偏差项而努力。
更新:
我找到了一种更简单的方法:
update_weights = [tf.assign(new, old) for (new, old) in
zip(tf.trainable_variables('new_scope'), tf.trainable_vars('old_scope'))]
Run Code Online (Sandbox Code Playgroud)
进行sess.run操作update_weights应将权重从一个网络复制到另一个网络。只需记住在单独的名称范围下进行构建即可。
| 归档时间: |
|
| 查看次数: |
1247 次 |
| 最近记录: |