尝试使用延续传递样式以避免使用minimax算法的堆栈溢出

Question

尝试使用延续传递样式以避免使用minimax算法的堆栈溢出

lis*_*ose 4 stack-overflow f# functional-programming minimax continuation-passing

我的目标摘要:弄清楚如何使用延续传递样式来避免在使用算法时堆栈溢出我认为不能使尾递归.或者,找到一种使函数尾递归的方法.

细节: 我是F#的新手(以及一般的函数式编程),我试图用alpha-beta修剪实现minimax算法.这是一种用于确定双人游戏的最佳移动的算法.算法的伪代码可以在这里找到:https://en.wikipedia.org/wiki/Alpha%E2%80%93beta_pruning

这是我发现有助于理解算法流程的资源:http://inst.eecs.berkeley.edu/~cs61b/fa14/ta-materials/apps/ab_tree_practice/

我的实现的一个不同之处在于我正在进行的游戏的玩家并不总是交替.出于这个原因,我从函数中删除了一个参数.我的实现如下:

let rec minimax node depth alpha beta =
    if depth = 0 || nodeIsTerminal node then
        heuristicValue node
    else
        match node.PlayerToMakeNextChoice with
        | PlayerOneMakesNextChoice ->
            takeMax (getChildren node) depth alpha beta    
        | PlayerTwoMakesNextChoice ->
            takeMin (getChildren node) depth alpha beta
and takeMax children depth alpha beta =      
    match children with
    | [] -> alpha
    | firstChild :: remainingChildren ->
        let newAlpha = [alpha; minimax firstChild (depth - 1) alpha beta] |> List.max

        if beta < newAlpha then newAlpha
        else takeMax remainingChildren depth newAlpha beta
and takeMin children depth alpha beta =      
    match children with
    | [] -> beta
    | firstChild :: remainingChildren ->
        let newBeta = [beta; minimax firstChild (depth - 1) alpha beta] |> List.min

        if newBeta < alpha then newBeta
        else takeMax remainingChildren depth alpha newBeta

Run Code Online (Sandbox Code Playgroud)

我遇到的问题是,虽然takeMax并且takeMin是尾递归的,但这些方法minimax在分配时调用newAlpha,newBeta因此当我minimax以较大的深度调用时,它仍然可能产生堆栈溢出.我做了一些研究,发现使用延续传递样式是一种使用堆而不是堆栈的潜在方法,当函数不能被尾递归时(我相信经过几个小时的尝试后,这不可能).虽然我可以理解一些非常基本的例子,但我很难理解如何将这个概念应用于这种情况; 如果有人能帮助我完成它,我将非常感激.

编辑1:我对解决方案的最佳理解

let minimax node depth alpha beta =
    let rec recurse node depth alpha beta k =
        if depth = 0 || nodeIsTerminal node then
            k (heuristicValue node)
        else
            match node.PlayerToMakeNextChoice with
            | PlayerOneMakesNextChoice ->
                takeMax (getChildren node) depth alpha beta k
            | PlayerTwoMakesNextChoice ->
                takeMin (getChildren node) depth alpha beta k
    and takeMax children depth alpha beta k =      
        match children with
        | [] -> k alpha
        | firstChild :: remainingChildren ->
            let continuation = fun minimaxResult ->
                let newAlpha = [alpha; minimaxResult] |> List.max

                if beta < newAlpha then k newAlpha
                else takeMax remainingChildren depth newAlpha beta k

            recurse firstChild (depth - 1) alpha beta continuation
    and takeMin children depth alpha beta k =      
        match children with
        | [] -> k beta
        | firstChild :: remainingChildren ->
            let continuation = fun minimaxResult ->
                let newBeta = [beta; minimaxResult] |> List.min

                if newBeta < alpha then k newBeta
                else takeMax remainingChildren depth alpha newBeta k

            recurse firstChild (depth - 1) alpha beta continuation
    recurse node depth alpha beta id

Run Code Online (Sandbox Code Playgroud)

Answer 1

Fyo*_*kin 8

正如您在"基本示例"中无疑看到的那样,一般的想法是采用一个额外的参数("延续",通常表示k),这是一个函数,每次返回一个值时,将该值传递给继续而不是.因此,例如,要修改minimax这种方式,我们会得到:

let rec minimax node depth alpha beta k =
    if depth = 0 || nodeIsTerminal node then
        k (heuristicValue node)
    else
        match node.PlayerToMakeNextChoice with
        | PlayerOneMakesNextChoice ->
            k (takeMax (getChildren node) depth alpha beta)
        | PlayerTwoMakesNextChoice ->
            k (takeMin (getChildren node) depth alpha beta)

Run Code Online (Sandbox Code Playgroud)

然后,呼叫站点需要"内翻",可以这么说,所以不是这样的:

let a = minimax ...
let b = f a
let c = g b
c

Run Code Online (Sandbox Code Playgroud)

我们会写这样的东西:

minimax ... (fun a ->
   let b = f a
   let c = g b
   c
)

Run Code Online (Sandbox Code Playgroud)

看到？a以前是返回值minimax,但现在a是传递给的延续的参数minimax.运行时机制是,一旦minimax完成运行,它将调用此延续,其结果值将显示为参数a.

因此,要将其应用于您的真实代码,我们会得到:

| firstChild :: remainingChildren ->
    minimax firstChild (depth - 1) alpha beta (fun minimaxResult ->
        let newAlpha = [alpha; minimaxResult] |> List.max

        if beta < newAlpha then newAlpha
        else takeMax remainingChildren depth newAlpha beta
    )

Run Code Online (Sandbox Code Playgroud)

好吧,这是一个好主意,但这只是一半的工作:我们已经重写了minimax在CPS,但takeMin和takeMax仍然是递归的.不好.

所以我们先做takeMax.同样的想法:添加一个额外的参数k,每当我们"返回"一个值时,将其传递给k:

and takeMax children depth alpha beta k =      
    match children with
    | [] -> k alpha
    | firstChild :: remainingChildren ->
        minimax firstChild (depth - 1) alpha beta (fun minimaxResult ->
            let newAlpha = [alpha; minimaxResult] |> List.max

            if beta < newAlpha then k newAlpha
            else takeMax remainingChildren depth newAlpha beta k
        )

Run Code Online (Sandbox Code Playgroud)

现在,当然,我必须相应地修改呼叫站点:

let minimax ... k =
    ...
    match node.PlayerToMakeNextChoice with
    | PlayerOneMakesNextChoice ->
        takeMax (getChildren node) depth alpha beta k

Run Code Online (Sandbox Code Playgroud)

等等,刚刚发生了什么？我只是说,每次我返回一个值,我都应该将它传递给它k,但在这里我不是这样做的.相反,我正在传递给k自己takeMax.咦？

那么,"而不是返回传递给k" 的规则只是该方法的第一部分.第二部分是 - "每次递归调用,传递k链".这样,原始顶级k将沿整个递归调用链向下移动,并最终由任何函数决定停止递归调用.

请记住,尽管CPS有助于堆栈溢出,但它并不能解除内存限制.所有这些中间值不再出现在堆栈中,但它们必须到达某个地方.在这种情况下,每次我们构造lambda时fun minimaxResult -> ...,这都是堆分配.所以你的所有中间值都在堆上.

但是有一个很好的对称性:如果算法是真正的尾递归,你可以在调用链中传递原始的顶级延续,而不需要分配任何中间lambda,因此你不需要任何堆内存.

归档时间：	7 年，11 月前
查看次数：	149 次
最近记录：	7 年，10 月前