懒散的,广度优先的一元玫瑰树是否可能展开?

dfe*_*uer 11 algorithm tree haskell unfold monadfix

Data.Tree包括unfoldTreeM_BFunfoldForestM_BF使用monadic动作的结果构造树广度优先的功能.可以使用forest unfolder轻松编写树展开文件,因此我将重点关注后者:

unfoldForestM_BF :: Monad m =>
             (b -> m (a, [b])) -> [b] -> m [Tree a]
Run Code Online (Sandbox Code Playgroud)

从种子列表开始,它为每个种子应用一个函数,生成将产生树根的动作和用于下一级展开的种子.所使用的算法是稍微严格,因此,使用unfoldForestM_BF与该Identity单子是不完全一样使用纯unfoldForest.我一直试图弄清楚是否有办法让它变得懒惰而不牺牲它的O(n)时间限制.如果(正如Edward Kmett向我建议的那样)这是不可能的,我想知道是否有可能采用更具约束力的类型,特别是需要MonadFix而不是Monad.这个概念将(以某种方式)设置指向未来计算结果的指针,同时将这些计算添加到待办事项列表中,因此如果它们在早期计算的效果中是惰性的,则它们将立即可用.

Cir*_*dec 15

我之前声称下面提出的第三个解决方案与深度优先具有相同的严格性unfoldForest,这是不正确的.

Your intuition that trees can be lazily unfolded breadth first is at least partially correct, even if we don't require a MonadFix instance. Solutions exist for the special cases when the branching factor is known to be finite and when the branching factor is known to be "large". We will start with a solution that runs in O(n) time for trees with finite branching factors including degenerate trees with only one child per node. The solution for finite branching factors will fail to terminate on trees with infinite branching factors, which we will rectify with a solution that that runs in O(n) time for trees with "large" branching factors greater than one including trees with infinite branching factor. The solution for "large" branching factors will run in O(n^2) time on degenerate trees with only one child or no children per node. When we combine the methods from both steps in an attempt to make a hybrid solution that runs in O(n) time for any branching factor we will get a solution that is lazier than the first solution for finite branching factors but cannot accommodate trees that make a rapid transition from an infinite branching factor to having no branches.

Finite Branching Factor

The general idea is that we will first build all the labels for an entire level and the seeds for the forests for the next level. We will then descend into the next level, building all of it. We will collect together the results from the deeper level to build the forests for the outer level. We will put the labels together with the forests to build the trees.

unfoldForestM_BF is fairly simple. If there are no seeds for the level it returns. After building all the labels, it takes the seeds for each forest and collects them together into one list of all the seeds to build the next level and unfolds the entire deeper level. Finally it constructs the forest for each tree from the structure of the seeds.

import Data.Tree hiding (unfoldTreeM_BF, unfoldForestM_BF)

unfoldForestM_BF :: Monad m => (b->m (a, [b])) -> [b] -> m [Tree a]
unfoldForestM_BF f []    = return []
unfoldForestM_BF f seeds = do
    level <- sequence . fmap f $ seeds
    let (labels, bs) = unzip level
    deeper <- unfoldForestM_BF f (concat bs)
    let forests = trace bs deeper
    return $ zipWith Node labels forests
Run Code Online (Sandbox Code Playgroud)

trace reconstructs the structure of nested lists from a flattened list.It is assumed that there is an item in [b] for each of the items anywhere in [[a]]. The use of concat ... trace to flatten all the information about ancestor levels prevents this implementation from working on trees with infinite children for a node.

trace :: [[a]] -> [b] -> [[b]]
trace []       ys = []
trace (xs:xxs) ys =
    let (ys', rem) = takeRemainder xs ys
    in   ys':trace xxs rem
    where
        takeRemainder []        ys  = ([], ys)
        takeRemainder (x:xs) (y:ys) = 
            let (  ys', rem) = takeRemainder xs ys
            in  (y:ys', rem)
Run Code Online (Sandbox Code Playgroud)

Unfolding a tree is trivial to write in terms of unfolding a forest.

unfoldTreeM_BF :: MonadFix m => (b->m (a, [b])) -> b -> m (Tree a)
unfoldTreeM_BF f = (>>= return . head) . unfoldForestMFix_BF f . (:[])
Run Code Online (Sandbox Code Playgroud)

Large Branching Factor

The solution for large branching factor proceeds in much the same way as the solution for finite branching factor, except it keeps the entire structure of the tree around instead of concatenating the branches in a level to a single list and traceing that list. In addition to the imports used in the previous section, we will be using Compose to compose the functors for multiple levels of a tree together and Traversable to sequence across multi-level structures.

import Data.Tree hiding (unfoldForestM_BF, unfoldTreeM_BF)

import Data.Foldable
import Data.Traversable
import Data.Functor.Compose
import Prelude hiding (sequence, foldr)
Run Code Online (Sandbox Code Playgroud)

Instead of flattening all of the ancestor structures together with concat we will wrap with Compose the ancestors and the seeds for the next level and recurse on the entire structure.

unfoldForestM_BF :: (Traversable t, Traceable t, Monad m) =>
                    (b->m (a, [b])) -> t b -> m (t (Tree a))
unfoldForestM_BF f seeds
    | isEmpty seeds = return (fmap (const undefined) seeds)
    | otherwise     = do
        level <- sequence . fmap f $ seeds
        deeper <- unfoldForestM_BF f (Compose (fmap snd level))
        return $ zipWithIrrefutable Node (fmap fst level) (getCompose deeper)
Run Code Online (Sandbox Code Playgroud)

zipWithIrrefutable是一个更懒的版本,zipWith它依赖于第一个列表中每个项目的第二个列表中有一个项目的假设.Traceable结构是Functors可以提供的结构zipWithIrrefutable.对于法律Traceable是每个a,xs以及ys如果fmap (const a) xs == fmap (const a) yszipWithIrrefutable (\x _ -> x) xs ys == xszipWithIrrefutable (\_ y -> y) xs ys == ys.它的严格性,给出了每一个fxs通过zipWithIrrefutable f xs ? == fmap (\x -> f x ?) xs.

class Functor f => Traceable f where
    zipWithIrrefutable :: (a -> b -> c) -> f a -> f b -> f c 
Run Code Online (Sandbox Code Playgroud)

如果我们已经知道它们具有相同的结构,我们可以懒惰地组合两个列表.

instance Traceable [] where
    zipWithIrrefutable f []       ys    = []
    zipWithIrrefutable f (x:xs) ~(y:ys) = f x y : zipWithIrrefutable f xs ys 
Run Code Online (Sandbox Code Playgroud)

如果我们知道我们可以组合每个仿函数,我们可以组合两个仿函数的组合.

instance (Traceable f, Traceable g) => Traceable (Compose f g) where
    zipWithIrrefutable f (Compose xs) (Compose ys) =
        Compose (zipWithIrrefutable (zipWithIrrefutable f) xs ys)
Run Code Online (Sandbox Code Playgroud)

isEmpty检查节点的空结构是否像[]有限分支因子的解决方案中的模式匹配那样进行扩展.

isEmpty :: Foldable f => f a -> Bool
isEmpty = foldr (\_ _ -> False) True
Run Code Online (Sandbox Code Playgroud)

细心的读者可能会注意到,zipWithIrrefutableTraceable非常相似,liftA2这是定义的一半Applicative.

混合解决方案

混合解决方案结合了有限解决方案和"大"解决方案的方法.与有限解一样,我们将在每一步压缩和解压缩树表示.与"大"分支因子的解决方案一样,我们将使用允许跨越完整分支的数据结构.有限分支因子解决方案使用的是一种在任何地方都被展平的数据类型[b]."大"分支因子解决方案使用的数据类型无处平坦:越来越多的嵌套列表[b]从那时[[b]]开始[[[b]]],依此类推.在这些结构之间将是嵌套列表,这些列表要么停止嵌套,要么只b保持嵌套并保持嵌套[b].Freemonad 一般描述了这种递归模式.

data Free f a = Pure a | Free (f (Free f a))
Run Code Online (Sandbox Code Playgroud)

We will be working specifically with Free [] which looks like.

data Free [] a = Pure a | Free [Free [] a]
Run Code Online (Sandbox Code Playgroud)

For the hybrid solution we will repeat all of its imports and components so that the code below here should be complete working code.

import Data.Tree hiding (unfoldTreeM_BF, unfoldForestM_BF)

import Data.Traversable
import Prelude hiding (sequence, foldr)
Run Code Online (Sandbox Code Playgroud)

Since we will be working with Free [], we will provide it with a zipWithIrrefutable.

class Functor f => Traceable f where
    zipWithIrrefutable :: (a -> b -> c) -> f a -> f b -> f c  

instance Traceable [] where
    zipWithIrrefutable f []       ys    = []
    zipWithIrrefutable f (x:xs) ~(y:ys) = f x y : zipWithIrrefutable f xs ys 

instance (Traceable f) => Traceable (Free f) where
    zipWithIrrefutable f (Pure x)  ~(Pure y ) = Pure (f x y)
    zipWithIrrefutable f (Free xs) ~(Free ys) =
        Free (zipWithIrrefutable (zipWithIrrefutable f) xs ys)
Run Code Online (Sandbox Code Playgroud)

The breadth first traversal will look very similar to the original version for the finitely branching tree. We build the current labels and seeds for the current level, compress the structure of the remainder of the tree, do all the work for the remaining depths, and decompress the structure of the results to get the forests to go with the labels.

unfoldFreeM_BF :: (Monad m) => (b->m (a, [b])) -> Free [] b -> m (Free [] (Tree a))
unfoldFreeM_BF f (Free []) = return (Free [])
unfoldFreeM_BF f seeds = do
    level <- sequence . fmap f $ seeds
    let (compressed, decompress) = compress (fmap snd level)
    deeper <- unfoldFreeM_BF f compressed
    let forests = decompress deeper
    return $ zipWithIrrefutable Node (fmap fst level) forests
Run Code Online (Sandbox Code Playgroud)

compress takes a Free [] holding the seeds for forests [b] and flattens the [b] into the Free to get a Free [] b. It also returns a decompress function that can be used to undo the flattening to get the original structure back. We compress away branches with no remaining seeds and branches that only branch one way.

compress :: Free [] [b] -> (Free [] b, Free [] a -> Free [] [a])
compress (Pure [x]) = (Pure x, \(Pure x) -> Pure [x])
compress (Pure xs ) = (Free (map Pure xs), \(Free ps) -> Pure (map getPure ps))
compress (Free xs)  = wrapList . compressList . map compress $ xs
    where    
        compressList []                 = ([], const [])
        compressList ((Free [],dx):xs) = let (xs', dxs) = compressList xs
                                         in  (xs', \xs -> dx (Free []):dxs xs)
        compressList (      (x,dx):xs) = let (xs', dxs) = compressList xs
                                         in  (x:xs', \(x:xs) -> dx x:dxs xs)
        wrapList ([x], dxs) = (x,             \x   -> Free (dxs [x]))
        wrapList (xs , dxs) = (Free xs, \(Free xs) -> Free (dxs xs ))
Run Code Online (Sandbox Code Playgroud)

Each compression step also returns a function that will undo it when applied to a Free [] tree with the same structure. All of these functions are partially defined; what they do to Free [] trees with a different structure is undefined. For simplicity we also define partial functions for the inverses of Pure and Free.

getPure (Pure x)  = x
getFree (Free xs) = xs
Run Code Online (Sandbox Code Playgroud)

Both unfoldForestM_BF and unfoldTreeM_BF are defined by packaging their argument up into a Free [] b and unpackaging the results assuming they are in the same structure.

unfoldTreeM_BF :: MonadFix m => (b->m (a, [b])) -> b -> m (Tree a)
unfoldTreeM_BF f = (>>= return . getPure) . unfoldFreeM_BF f . Pure


unfoldForestM_BF :: MonadFix m => (b->m (a, [b])) -> [b] -> m [Tree a]
unfoldForestM_BF f = (>>= return . map getPure . getFree) . unfoldFreeM_BF f . Free . map Pure
Run Code Online (Sandbox Code Playgroud)

A more elegant version of this algorithm can probably be made by recognizing that >>= for a Monad is grafting on trees and both Free and FreeT provide monad instances. Both compress and compressList probably have more elegant presentations.

The algorithm presented above is not lazy enough to allow querying trees that branch an infinite number of ways and then terminate. A simple counter example is the following generating function unfolded from 0.

counterExample :: Int -> (Int, [Int])
counterExample 0 = (0, [1, 2])
counterExample 1 = (1, repeat 3)
counterExample 2 = (2, [3])
counterExample 3 = (3, [])
Run Code Online (Sandbox Code Playgroud)

This tree would look like

0
|
+- 1
|  |
|  +- 3
|  |
|  `- 3
|  |
|  ...
|
`- 2
   |
   +- 3
Run Code Online (Sandbox Code Playgroud)

Attempting to descend the second branch (to 2) and inspect the remaining finite sub-tree will fail to terminate.

Examples

The following examples demonstrate that all implementations of unfoldForestM_BF run actions in breadth first order and that runIdentity . unfoldTreeM_BF (Identity . f) has the same strictness as unfoldTree for trees with finite branching factor. For trees with inifinite branching factor, only the solution for "large" branching factors has the same strictness as unfoldTree. To demonstrate laziness we'll define three infinite trees - a unary tree with one branch, a binary tree with two branches, and an infinitary tree with an infinite number of branches for each node.

mkUnary :: Int -> (Int, [Int])
mkUnary x = (x, [x+1])

mkBinary :: Int -> (Int, [Int])
mkBinary x = (x, [x+1,x+2])

mkInfinitary :: Int -> (Int, [Int])
mkInfinitary x = (x, [x+1..])
Run Code Online (Sandbox Code Playgroud)

Together with unfoldTree, we will define unfoldTreeDF in terms of unfoldTreeM to check that unfoldTreeM really is lazy like you claimed and unfoldTreeBF in terms of unfoldTreeMFix_BF to check that the new implementation is just as lazy.

import Data.Functor.Identity

unfoldTreeDF f = runIdentity . unfoldTreeM    (Identity . f)
unfoldTreeBF f = runIdentity . unfoldTreeM_BF (Identity . f)
Run Code Online (Sandbox Code Playgroud)

To get finite pieces of these infinite trees, even the infinitely branching one, we'll define a way to take from a tree as long as its labels match a predicate. This could be written more succinctly in terms of the ability to apply a function to every subForest.

takeWhileTree :: (a -> Bool) -> Tree a -> Tree a
takeWhileTree p (Node label branches) = Node label (takeWhileForest p branches)

takeWhileForest :: (a -> Bool) -> [Tree a] -> [Tree a]
takeWhileForest p = map (takeWhileTree p) . takeWhile (p . rootLabel)
Run Code Online (Sandbox Code Playgroud)

This lets us define nine example trees.

unary   = takeWhileTree (<= 3) (unfoldTree   mkUnary 0)
unaryDF = takeWhileTree (<= 3) (unfoldTreeDF mkUnary 0)
unaryBF = takeWhileTree (<= 3) (unfoldTreeBF mkUnary 0)

binary   = takeWhileTree (<= 3) (unfoldTree   mkBinary 0)
binaryDF = takeWhileTree (<= 3) (unfoldTreeDF mkBinary 0)
binaryBF = takeWhileTree (<= 3) (unfoldTreeBF mkBinary 0)

infinitary   = takeWhileTree (<= 3) (unfoldTree   mkInfinitary 0)
infinitaryDF = takeWhileTree (<= 3) (unfoldTreeDF mkInfinitary 0)
infinitaryBF = takeWhileTree (<= 3) (unfoldTreeBF mkInfinitary 0)
Run Code Online (Sandbox Code Playgroud)

All five methods have the same output for the unary and binary trees. The output comes from putStrLn . drawTree . fmap show

0
|
`- 1
   |
   `- 2
      |
      `- 3

0
|
+- 1
|  |
|  +- 2
|  |  |
|  |  `- 3
|  |
|  `- 3
|
`- 2
   |
   `- 3
Run Code Online (Sandbox Code Playgroud)

However, the breadth first traversal from the finite branching factor solution is not sufficiently lazy for a tree with an infinite branching factor. The other four methods output the entire tree

0
|
+- 1
|  |
|  +- 2
|  |  |
|  |  `- 3
|  |
|  `- 3
|
+- 2
|  |
|  `- 3
|
`- 3
Run Code Online (Sandbox Code Playgroud)

The tree generated with unfoldTreeBF for the finite branching factor solution can never be completely drawn past its first branches.

0
|
+- 1
|  |
|  +- 2
|  |  |
|  |  `- 3
|  |
|  `- 3
Run Code Online (Sandbox Code Playgroud)

The construction is definitely breadth first.

mkDepths :: Int -> IO (Int, [Int])
mkDepths d = do
    print d
    return (d, [d+1, d+1])

mkFiltered :: (Monad m) => (b -> Bool) -> (b -> m (a, [b])) -> (b -> m (a, [b]))
mkFiltered p f x = do
    (a, bs) <- f x
    return (a, filter p bs)

binaryDepths = unfoldTreeM_BF (mkFiltered (<= 2) mkDepths) 0
Run Code Online (Sandbox Code Playgroud)

Running binaryDepths outputs the outer levels before the inner ones

0
1
1
2
2
2
2
Run Code Online (Sandbox Code Playgroud)

From Lazy to Downright Slothful

The hybrid solution from the earlier section is not lazy enough to have the same strictness semantics as Data.Tree's unfoldTree. It is the first in a series of algorithms, each slightly lazier than their predecessor, but none lazy enough to have the same strictness semantics as unfoldTree.

The hybrid solution does not provide a guarantee that exploring a portion of a tree doesn't demand exploring other portions of the same tree. Nor will the code presented below. In one particular yet common case identified by dfeuer exploring only a log(N) sized slice of a finite tree forces the entirety of the tree. This happens when exploring the last descendant of each branch of a tree with constant depth. When compressing the tree we throw out every trivial branch with no descendants, which is necessary to avoid O(n^2) running time. We can only lazily skip over this portion of compression if we can quickly show that a branch has at least one descendant and we can therefore reject the pattern Free []. At the greatest depth of the tree with constant depth, none of the branches have any remaining descendants, so we can never skip a step of the compression. This results in exploring the entire tree to be able to visit the very last node. When the entire tree to that depth is non-finite due to infinite branching factor, exploring a portion of the tree fails to terminate when it would terminate when generated by unfoldTree.

The compression step in the hybrid solution section compresses away branches with no descendants in the first generation that they can be discovered in, which is optimal for compression but not optimal for laziness. We can make the algorithm lazier by delaying when this compression occurs. If we delay it by a single generation (or even any constant number of generations) we will maintain the O(n) upper bound on time. If we delay it by a number of generations that somehow depends on N we would necessarily sacrifice the O(N) time bound. In this section we will delay the compression by a single generation.

To control how compression happens, we will separate stuffing the innermost [] into the Free [] structure from compressing away degenerate branches with 0 or 1 descendants.

Because part of this trick doesn't work without a lot of laziness in the compression, we will adopt a paranoid level of excessively slothful laziness everywhere. If anything about a result other than the tuple constructor (,) could be determined without forcing part of its input with a pattern match we will avoid forcing it until it is necessary. For tuples, anything pattern matching on them will do so lazily. Consequently, some of the code below will look like core or worse.

bindFreeInvertible replaces Pure [b,...] with Free [Pure b,...]

bindFreeInvertible :: Free [] ([] b) -> (Free [] b, Free [] a -> Free [] ([] a))
bindFreeInvertible = wrapFree . go
    where
        -- wrapFree adds the {- Free -} that would have been added in both branches
        wrapFree ~(xs, dxs) = (Free xs, dxs)
        go (Pure xs) = ({- Free -} (map Pure xs), Pure . map getPure . getFree)
        go (Free xs) = wrapList . rebuildList . map bindFreeInvertible $ xs
        rebuildList = foldr k ([], const [])
        k ~(x,dx) ~(xs, dxs) = (x:xs, \(~(x:xs)) -> dx x:dxs xs)
        wrapList ~(xs, dxs) = ({- Free -} xs, \(~(Free xs)) -> Free (dxs xs)))
Run Code Online (Sandbox Code Playgroud)

compressFreeList removes occurrences of Free [] and replaces Free [xs] with xs.

compressFreeList :: Free [] b -> (Free [] b, Free [] a -> Free [] a)
compressFreeList (Pure x) = (Pure x, id)
compressFreeList (Free xs) = wrapList . compressList . map compressFreeList $ xs
    where
        compressList = foldr k ([], const [])
        k ~(x,dx) ~(xs', dxs) = (x', dxs')
            where
                x' = case x of
                        Free []   -> xs'
                        otherwise -> x:xs'
                dxs' cxs = dx x'':dxs xs''
                    where
                        x'' = case x of
                            Free []   -> Free []
                            otherwise -> head cxs
                        xs'' = case x of
                            Free []   -> cxs
                            otherwise -> tail cxs
        wrapList ~(xs, dxs) = (xs', dxs')
            where
                xs' = case xs of
                        [x]       -> x
                        otherwise -> Free xs
                dxs' cxs = Free (dxs xs'')
                    where
                        xs'' = case xs of
                            [x]       -> [cxs]
                            otherwise -> getFree cxs
Run Code Online (Sandbox Code Playgroud)

The overall compression will not bind the Pure []s into Frees until after the degenerate Frees have been compressed away, delaying compression of degenerate Frees introduced in one generation to the next generation's compression.

compress :: Free [] [b] -> (Free [] b, Free [] a -> Free [] [a])
compress xs = let ~(xs' , dxs' ) = compressFreeList xs
                  ~(xs'', dxs'') = bindFreeInvertible xs'
                  in (xs'', dxs' . dxs'')
Run Code Online (Sandbox Code Playgroud)

Out of continued paranoia, the helpers getFree and getPure are also made irrefutably lazy.

getFree ~(Free xs) = xs
getPure ~(Pure x)  = x
Run Code Online (Sandbox Code Playgroud)

This very quickly solves the problematic example dfeuer discovered

print . until (null . subForest) (last . subForest) $
    flip unfoldTreeBF 0 (\x -> (x, if x > 5 then [] else replicate 10 (x+1)))
Run Code Online (Sandbox Code Playgroud)

But since we only delayed the compression by 1 generation, we can recreate exactly the same problem if the very last node of the very last branch is 1 level deeper than all of the other branches.

print . until (null . subForest) (last . subForest) $
    flip unfoldTreeBF (0,0) (\(x,y) -> ((x,y), 
        if x==y
        then if x>5 then [] else replicate 9 (x+1, y) ++ [(x+1, y+1)]
        else if x>4 then [] else replicate 10 (x+1, y)))
Run Code Online (Sandbox Code Playgroud)

  • 我只是说,"做得非常好!" (2认同)