标签: tensorflow-agents

py_environment“time_step”与“time_step_spec”不匹配

我通过 tf 代理创建了一个自定义 py 环境。但是，我无法使用 py_policy.action 验证环境或在其中采取步骤，我对 time_step_specs 中排除的内容感到困惑

我尝试通过 tf_py_environment.TFPyEnvironment 转换为 tf_py_environment 并成功地使用 tf_policy 采取行动，但我仍然对其中的差异感到困惑。

import abc
import numpy as np
from tf_agents.environments import py_environment
from tf_agents.environments import tf_environment
from tf_agents.environments import tf_py_environment
from tf_agents.environments import utils
from tf_agents.specs import array_spec
from tf_agents.environments import wrappers
from tf_agents.trajectories import time_step as ts
from tf_agents.policies import random_tf_policy
import tensorflow as tf
import tf_agents

class TicTacToe(py_environment.PyEnvironment):
   def __init__(self,n):
    super(TicTacToe,self).__init__()
    self.n = n
    self.winner = None
    self._episode_ended = False
    self.inital_state = np.zeros((n,n))
    self._state …

Run Code Online (Sandbox Code Playgroud)

tensorflow-agents

Boy*_*ang

lucky-day

6
推荐指数

1
解决办法

2507
查看次数

ValueError: 找不到匹配的函数来调用从 SavedModel 加载

我正在尝试加载tf-agents我通过以下方式保存的策略

try:
    PolicySaver(collect_policy).save(model_dir + 'collect_policy')
except TypeError:
    tf.saved_model.save(collect_policy, model_dir + 'collect_policy')

Run Code Online (Sandbox Code Playgroud)

try/except 块的快速解释：最初创建策略时，我可以通过保存它PolicySaver，但是当我再次加载它以进行另一次训练运行时，它是一个SavedModel，因此无法通过保存PolicySaver。

这似乎工作正常，但现在我想使用此策略进行自我播放，因此我self.policy = tf.saved_model.load(policy_path)在我的 AIPlayer 类中加载了该策略。但是，当我尝试将其用于预测时，它不起作用。这是（测试）代码：

def decide(self, table):
    state = table.getState()
    timestep = ts.restart(np.array([table.getState()], dtype=np.float))
    prediction = self.policy.action(timestep)
    print(prediction)

Run Code Online (Sandbox Code Playgroud)

在table传递给函数包含了游戏的状态和ts.restart()功能是从我的自定义pyEnvironment拷贝，因此时间步长的构造完全相同的方式，因为它会在环境中。但是，我收到该行的以下错误消息prediction=self.policy.action(timestep)：

ValueError: Could not find matching function to call loaded from the SavedModel. Got:
  Positional arguments (2 total):
    * TimeStep(step_type=<tf.Tensor 'time_step:0' shape=() dtype=int32>, reward=<tf.Tensor 'time_step_1:0' shape=() dtype=float32>, discount=<tf.Tensor 'time_step_2:0' shape=() …

Run Code Online (Sandbox Code Playgroud)

python tensorflow tensorflow-agents

Tax*_*xel

2019 08-22

6
推荐指数

1
解决办法

3037
查看次数

tf.agent 策略可以为所有动作返回概率向量吗？

我正在尝试使用 TF-Agent TF-Agent DQN Tutorial训练强化学习代理。在我的应用程序中，我有 1 个动作，其中包含 9 个可能的离散值（标记为 0 到 8）。下面是输出env.action_spec()

BoundedTensorSpec(shape=(), dtype=tf.int64, name='action', minimum=array(0, dtype=int64), maximum=array(8, dtype=int64))

Run Code Online (Sandbox Code Playgroud)

我想得到概率向量包含所有由训练策略计算的动作，并在其他应用环境中做进一步处理。但是，该策略仅返回log_probability一个值，而不是所有操作的向量。反正有没有得到概率向量？

from tf_agents.networks import q_network
from tf_agents.agents.dqn import dqn_agent

q_net = q_network.QNetwork(
            env.observation_spec(),
            env.action_spec(),
            fc_layer_params=(32,)
        )

optimizer = tf.compat.v1.train.AdamOptimizer(learning_rate=0.001)

my_agent = dqn_agent.DqnAgent(
    env.time_step_spec(),
    env.action_spec(),
    q_network=q_net,
    epsilon_greedy=epsilon,
    optimizer=optimizer,
    emit_log_probability=True,
    td_errors_loss_fn=common.element_wise_squared_loss,
    train_step_counter=global_step)

my_agent.initialize()

...  # training

tf_policy_saver = policy_saver.PolicySaver(my_agent.policy)
tf_policy_saver.save('./policy_dir/')

# making decision using the trained policy
action_step = my_agent.policy.action(time_step)

Run Code Online (Sandbox Code Playgroud)

在dqn_agent.DqnAgent() DQNAgent 中，我设置了emit_log_probability=True，它应该定义Whether policies …

python reinforcement-learning tensorflow2.0 tensorflow-agents

BIN*_*HAO

2020 08-27

5
推荐指数

1
解决办法

374
查看次数