Occ*_*ade 5 python artificial-intelligence minimax alpha-beta-pruning
我正在使用Python为2048游戏编写AI.它比我预期的要慢很多.我将深度限制设置为5,仍然需要几秒钟才能得到答案.起初我认为我所有函数的实现都是垃圾,但我找出了真正的原因.搜索树上的叶子数量甚至超过了甚至可能的数量.
这是一个典型的结果(我计算了叶子,分支和扩展数量):
111640 leaves, 543296 branches, 120936 expansions
Branching factor: 4.49242574585
Expected max leaves = 4.49242574585^5 = 1829.80385192 leaves
Run Code Online (Sandbox Code Playgroud)
和另一个,好的措施:
99072 leaves, 488876 branches, 107292 expansions
Branching factor: 4.55650001864
Expected max leaves = 4.55650001864^5 = 1964.06963743 leaves
Run Code Online (Sandbox Code Playgroud)
正如您所看到的,搜索树上的叶子数量多于使用天真极小极大时的叶子数量.这里发生了什么?我的算法发布如下:
# Generate constants
import sys
posInfinity = sys.float_info.max
negInfinity = -sys.float_info.max
# Returns the direction of the best move given current state and depth limit
def bestMove(grid, depthLimit):
global limit
limit = depthLimit
moveValues = {}
# Match each move to its minimax value
for move in Utils2048.validMoves(grid):
gridCopy = [row[:] for row in grid]
Utils2048.slide(gridCopy, move)
moveValues[move] = minValue(grid, negInfinity, posInfinity, 1)
# Return move that have maximum value
return max(moveValues, key = moveValues.get)
# Returns the maximum utility when the player moves
def maxValue(grid, a, b, depth):
successors = Utils2048.maxSuccessors(grid)
if len(successors) == 0 or limit < depth:
return Evaluator.evaluate(grid)
value = negInfinity
for successor in successors:
value = max(value, minValue(successor, a, b, depth + 1))
if value >= b:
return value
a = max(a, value)
return value
# Returns the minimum utility when the computer moves
def minValue(grid, a, b, depth):
successors = Utils2048.minSuccessors(grid)
if len(successors) == 0 or limit < depth:
return Evaluator.evaluate(grid)
value = posInfinity
for successor in successors:
value = min(value, maxValue(successor, a, b, depth + 1))
if value <= a:
return value
b = min(b, value)
return value
Run Code Online (Sandbox Code Playgroud)
有人请帮帮我.我多次查看这段代码,但我无法确定错误.
显然,您正在将value
(b
beta)和a
(alpha)进行比较。您的代码中的比较如下:
def maxValue(grid, a, b, depth):
.....
.....
if value >= b:
return value
a = max(a, value)
return value
Run Code Online (Sandbox Code Playgroud)
和
def minValue(grid, a, b, depth):
.....
.....
if value <= a:
return value
b = min(b, value)
return value
Run Code Online (Sandbox Code Playgroud)
然而,α-β剪枝的条件是,只要α增长超过β,即α>β,我们就不需要遍历搜索树。
因此,应该是:
def maxValue(grid, a, b, depth):
....
.....
a = max(a, value)
if a > b:
return value
return value
Run Code Online (Sandbox Code Playgroud)
和
def minValue(grid, a, b, depth):
.....
.....
b = min(b, value)
if b < a:
return value
return value
Run Code Online (Sandbox Code Playgroud)
这是您错过的边缘情况,因为a
(alpha) 和b
(beta) 并不总是等于value
。