小编204*_*204的帖子

Cartpole-v0 的 PyTorch PPO 实现陷入局部最优

我已经为 Cartpole-VO 环境实现了 PPO。然而，它在游戏的某些迭代中并不收敛。有时它会陷入局部最优。我已经使用 TD-0 优势实现了该算法，即

A(s_t) = R(t+1) + \gamma V(S_{t+1}) - V(S_t)

这是我的代码：

def running_average(x, n):
    N = n
    kernel = np.ones(N)
    conv_len = x.shape[0]-N
    y = np.zeros(conv_len)
    for i in range(conv_len):
        y[i] = kernel @ x[i:i+N] # matrix multiplication operator: np.mul
        y[i] /= N
    return y



class ActorNetwork(nn.Module):
    def __init__(self, state_dim, n_actions, learning_rate=0.0003, epsilon_clipping=0.3, update_epochs=10):
        super().__init__()
        self.n_actions = n_actions
        self.model = nn.Sequential(
            nn.Linear(state_dim, 64),
            nn.ReLU(),
            nn.Linear(64, 32),
            nn.ReLU(),
            nn.Linear(32, n_actions),
            nn.Softmax(dim=-1)
        ).float()
        self.optimizer = optim.Adam(self.model.parameters(), lr=learning_rate)
        self.epsilon_clipping …

Run Code Online (Sandbox Code Playgroud)

python machine-learning reinforcement-learning pytorch policy-gradient-descent

204*_*204

2021 12-04

6
推荐指数

1
解决办法

854
查看次数

极柱图，最内圈为空使用 R

我想要一个圆形条形图，如下所示：

但现在，我只有：

为此，我在 R 中使用了以下代码：

require(ggplot2)
ggplot(PolarPlot,aes(x,y,
                     fill=x))+
      geom_bar(width=1,stat="identity")+
      coord_polar() + xlab("")+ylab("")+
  theme(legend.position = "none" , axis.text.y = element_blank() ,
        axis.ticks = element_blank()

Run Code Online (Sandbox Code Playgroud)

有人可以告诉我需要进行哪些修改才能获得所需的图表吗？

数据如下：

PolarPlot <- structure(list(x = structure(1:7, .Label = c("Class1", "Class2", 
    "Class3", "Class4", "Class5", "Class6", "Class7"), class = "factor"), 
    y = c(2L, 8L, 17L, 56L, 28L, 7L, 2L)), .Names = c("x", "y"),
    class = "data.frame", row.names = c(NA, -7L))

Run Code Online (Sandbox Code Playgroud)

plot r bar-chart ggplot2

204*_*204

2016 07-29

5
推荐指数

1
解决办法

2914
查看次数

Python 代码：有关循环/条件的执行跟踪的信息

我想根据完成时执行的循环和条件来获取 python 函数的执行跟踪。但是，我想在不使用附加参数检测原始 python 函数的情况下执行此操作。例如：

def foo(a: int, b: int):
    while a:
        a = do_something()
        if b:
            a = do_something()


if __name__ == "__main__":
    foo(a, b)

Run Code Online (Sandbox Code Playgroud)

在执行之后，foo()我想要一个类似的执行跟踪： [while: true, if:false, while: true, if: true, while: false, ...]它记录了代码中条件评估的序列。有没有办法为任意的python函数自动获取这些信息？

我了解“覆盖率”python 模块返回“分支覆盖率”信息。但我不确定如何在这种情况下使用它？

python unit-testing execution python-3.x

204*_*204

2021 04-25

3
推荐指数

1
解决办法

200
查看次数