无法加载已保存的策略（TF-agent）

Question

无法加载已保存的策略（TF-agent）

roh*_*raj 3 reinforcement-learning deep-learning tensorflow

我使用策略保存程序保存了训练有素的策略，如下所示：

  tf_policy_saver = policy_saver.PolicySaver(agent.policy)
  tf_policy_saver.save(policy_dir)

Run Code Online (Sandbox Code Playgroud)

我想继续使用保存的策略进行训练。所以我尝试使用保存的策略初始化训练，这导致了一些错误。

agent = dqn_agent.DqnAgent(
tf_env.time_step_spec(),
tf_env.action_spec(),
q_network=q_net,
optimizer=optimizer,
td_errors_loss_fn=common.element_wise_squared_loss,
train_step_counter=train_step_counter)

agent.initialize()

agent.policy=tf.compat.v2.saved_model.load(policy_dir)

Run Code Online (Sandbox Code Playgroud)

错误：

  File "C:/Users/Rohit/PycharmProjects/pythonProject/waypoint.py", line 172, in <module>
agent.policy=tf.compat.v2.saved_model.load('waypoints\\Two_rewards')


File "C:\Users\Rohit\anaconda3\envs\btp36\lib\site-packages\tensorflow\python\training\tracking\tracking.py", line 92, in __setattr__
    super(AutoTrackable, self).__setattr__(name, value)
AttributeError: can't set attribute

Run Code Online (Sandbox Code Playgroud)

我只是想节省每次重新训练的时间。如何加载保存的策略并继续训练？

提前致谢

Answer 1

Fed*_*rba 5

是的，如前所述，您应该使用检查点来执行此操作，请查看下面的示例代码。

agent = ... # Agent Definition
policy = agent.policy
# Policy --> Y
policy_checkpointer = common.Checkpointer(ckpt_dir='path/to/dir',
                                          policy=policy)

... # Train the agent

# Policy --> X
policy_checkpointer.save(global_step=epoch_counter.numpy())

Run Code Online (Sandbox Code Playgroud)

当您稍后想要重新加载策略时，只需运行相同的初始化代码即可。

agent = ... # Agent Definition
policy = agent.policy
# Policy --> Y1, possibly Y1==Y depending on agent class you are using, if it's DQN
#               then they are different because of random initialization of network weights
policy_checkpointer = common.Checkpointer(ckpt_dir='path/to/dir',
                                          policy=policy)
# Policy --> X

Run Code Online (Sandbox Code Playgroud)

创建后，policy_checkpointer将自动识别是否存在任何预先存在的检查点。如果有，它将更新创建时自动跟踪的变量的值。

需要注意的几点：

使用检查点可以节省的不仅仅是策略，而且我确实建议这样做。TF-Agent 的 Checkpointer 对象非常灵活，例如：

train_checkpointer = common.Checkpointer(ckpt_dir=first/dir,
                                         agent=tf_agent,               # tf_agent.TFAgent
                                         train_step=train_step,        # tf.Variable
                                         epoch_counter=epoch_counter,  # tf.Variable
                                         metrics=metric_utils.MetricsGroup(
                                                 train_metrics, 'train_metrics'))

policy_checkpointer = common.Checkpointer(ckpt_dir=second/dir,
                                          policy=agent.policy)

rb_checkpointer = common.Checkpointer(ckpt_dir=third/dir,
                                      max_to_keep=1,
                                      replay_buffer=replay_buffer  # TFUniformReplayBuffer
                                      )

Run Code Online (Sandbox Code Playgroud)

DqnAgent请注意，在a的情况下，agent.policy和agent.collect_policy本质上是 QNetwork 的包装器。其含义如下面的代码所示（查看策略变量状态的注释）

agent = DqnAgent(...)
policy = agent.policy      # Random initial policy ---> X

dataset = replay_buffer.as_dataset(...)
for data in dataset:
   experience, _ = data
   loss_agent_info = agent.train(experience=experience)

# policy variable stores a trained Policy object ---> Y

Run Code Online (Sandbox Code Playgroud)

发生这种情况是因为 TF 中的张量在运行时共享。因此，当您使用更新代理的QNetwork权重时agent.train，这些相同的权重也会在policy变量的中隐式更新QNetwork。事实上，并不是的policyTensor 得到更新，而是它们只是与您的中的 Tensor 相同agent。

归档时间：	5 年，7 月前
查看次数：	1225 次
最近记录：	5 年，7 月前