fas*_*cen 5 python machine-learning pybrain
我试图训练ActionValueNetwork一个简单的XOR函数,但结果看起来像随机.
""" Reinforcement Learning to learn xor function
"""
# generic import
import numpy as np
import random
# pybrain import
from pybrain.rl.explorers import EpsilonGreedyExplorer
from pybrain.rl.agents import LearningAgent
from pybrain.rl.learners.valuebased import ActionValueNetwork, NFQ
# The parameters of your algorithm
av_network = ActionValueNetwork(2, 2) # 2 dimensions in input, 2 actions possible (1 or 0)
learner = NFQ()
learner._setExplorer(EpsilonGreedyExplorer(0.0)) # No exploration
agent = LearningAgent(av_network, learner)
# The training
for _ in xrange(1,25): # we iterate 25 times
for x in xrange(1,4): # batch of 4 questions.
listxor = random.choice([[0, 0],[0, 1], [1, 0], [1, 1]])
resultxor = listxor[0]^listxor[1] # xor operation
agent.integrateObservation(listxor)
action = agent.getAction()
reward = 1 - 2*abs(resultxor - float(action[0])) # 1 if correct, -1 otherwise
print "xor(",listxor,") = ", resultxor, " || action = " , action[0], "reward = ", reward
agent.giveReward(reward)
agent.learn()
# Test
print "test : "
print "[0, 0] ", learner.module.getMaxAction([0, 0])
print "[0, 1] ", learner.module.getMaxAction([0, 1])
print "[1, 0] ", learner.module.getMaxAction([1, 0])
print "[1, 1] ", learner.module.getMaxAction([1, 1])
Run Code Online (Sandbox Code Playgroud)
我知道,这不是Pybrain(tast,env,ect)的导向方式,但我必须这样做.我ActionValueTable和Q 有很好的结果,但我想用每个维度的重量.
有人可以解释我错在哪里吗?看起来网络似乎没有学到任何东西.
谢谢!
| 归档时间: |
|
| 查看次数: |
562 次 |
| 最近记录: |