小编EnT*_*DeS的帖子

深度强化学习 - CartPole 问题

我试图实现最简单的深度 Q 学习算法。我想,我已经正确地实施了它,并且知道深度 Q 学习在发散方面挣扎,但奖励下降得非常快,损失也在发散。如果有人能帮我指出正确的超参数,或者我是否错误地实施了算法,我将不胜感激。我尝试了很多超参数组合,也改变了 QNet 的复杂性。

import torch
import torch.optim as optim
import torch.nn as nn
import torch.nn.functional as F
import collections
import numpy as np
import matplotlib.pyplot as plt
import gym
from torch.nn.modules.linear import Linear
from torch.nn.modules.loss import MSELoss


class ReplayBuffer:
  def __init__(self, max_replay_size, batch_size):
    self.max_replay_size = max_replay_size
    self.batch_size      = batch_size
    self.buffer          = collections.deque()


def push(self, *transition):
    if len(self.buffer) == self.max_replay_size:
        self.buffer.popleft()
    self.buffer.append(transition)


def sample_batch(self):
    indices = np.random.choice(len(self.buffer), self.batch_size, replace = False)
    batch   = [self.buffer[index] for index in …
Run Code Online (Sandbox Code Playgroud)

python reinforcement-learning q-learning deep-learning pytorch

5
推荐指数
1
解决办法
137
查看次数