小编EnT*_*DeS的帖子

深度强化学习 - CartPole 问题

我试图实现最简单的深度 Q 学习算法。我想，我已经正确地实施了它，并且知道深度 Q 学习在发散方面挣扎，但奖励下降得非常快，损失也在发散。如果有人能帮我指出正确的超参数，或者我是否错误地实施了算法，我将不胜感激。我尝试了很多超参数组合，也改变了 QNet 的复杂性。

import torch
import torch.optim as optim
import torch.nn as nn
import torch.nn.functional as F
import collections
import numpy as np
import matplotlib.pyplot as plt
import gym
from torch.nn.modules.linear import Linear
from torch.nn.modules.loss import MSELoss


class ReplayBuffer:
  def __init__(self, max_replay_size, batch_size):
    self.max_replay_size = max_replay_size
    self.batch_size      = batch_size
    self.buffer          = collections.deque()


def push(self, *transition):
    if len(self.buffer) == self.max_replay_size:
        self.buffer.popleft()
    self.buffer.append(transition)


def sample_batch(self):
    indices = np.random.choice(len(self.buffer), self.batch_size, replace = False)
    batch   = [self.buffer[index] for index in …

Run Code Online (Sandbox Code Playgroud)

python reinforcement-learning q-learning deep-learning pytorch

EnT*_*DeS

2021 05-27

5
推荐指数

1
解决办法

137
查看次数