标签: tf-agent

为什么 tf-agents 缓冲区中的数据是随机顺序的

tl-dr 版本：为什么我采取的前 2 个操作/观察结果与重播缓冲区中的前两个对象不相符？

tf-agent 重播缓冲区会自动调整数据吗？

通过添加这些打印，我能够看到我的前两个步骤是什么样子的

print("just addding this as traj num = "+str(num))
print(" next time step  = "+str(next_time_step))
replay_buffer.add_batch(traj)

Run Code Online (Sandbox Code Playgroud)

这会产生

just addding this as traj num = 0
 next time step  = TimeStep(
{'discount': <tf.Tensor: shape=(1,), dtype=float32, numpy=array([0.], dtype=float32)>,
 'observation': <tf.Tensor: shape=(1, 1, 5, 5), dtype=float32, numpy=
array([[[[0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 1., 0.]]]], dtype=float32)>,
 'reward': <tf.Tensor: shape=(1,), dtype=float32, …

Run Code Online (Sandbox Code Playgroud)

python buffer tensorflow tf-agent

tgm*_*ack

2022 09-15

6
推荐指数

1
解决办法

251
查看次数

标签统计

buffer ×1

python ×1

tensorflow ×1

tf-agent ×1

为什么 tf-agents 缓冲区中的数据是随机顺序的

标签 统计

标签统计