也许我的问题看起来很愚蠢。
我正在研究 Q 学习算法。为了更好地理解它,我尝试将这个 FrozenLake示例的 Tenzorflow 代码改写成Keras代码。
我的代码:
import gym
import numpy as np
import random
from keras.layers import Dense
from keras.models import Sequential
from keras import backend as K
import matplotlib.pyplot as plt
%matplotlib inline
env = gym.make('FrozenLake-v0')
model = Sequential()
model.add(Dense(16, activation='relu', kernel_initializer='uniform', input_shape=(16,)))
model.add(Dense(4, activation='softmax', kernel_initializer='uniform'))
def custom_loss(yTrue, yPred):
return K.sum(K.square(yTrue - yPred))
model.compile(loss=custom_loss, optimizer='sgd')
# Set learning parameters
y = .99
e = 0.1
#create lists to contain total rewards …Run Code Online (Sandbox Code Playgroud)