目录树的广度优先遍历不是懒惰的

Question

目录树的广度优先遍历不是懒惰的

我尝试遍历目录树.天真的深度优先遍历似乎不会以懒惰的方式生成数据并且耗尽内存.我接下来尝试了广度优先的方法,它显示了同样的问题 - 它使用了所有可用的内存然后崩溃.

我的代码是:

getFilePathBreadtFirst :: FilePath -> IO [FilePath]
getFilePathBreadtFirst fp = do
  fileinfo <- getInfo fp 
  res :: [FilePath]  <- if isReadableDirectory fileinfo
          then do
                children  <- getChildren fp 
                lower    <-  mapM getFilePathBreadtFirst children  
                return (children ++  concat lower)
           else return [fp]        -- should only return the files? 
  return res 

getChildren :: FilePath -> IO [FilePath]
getChildren path = do 
          names <- getUsefulContents path
          let namesfull = map (path </>) names
          return namesfull

testBF fn = do  -- crashes for /home/frank, does not go to swap 
  fps <- getFilePathBreadtFirst fn
  putStrLn $ unlines fps

Run Code Online (Sandbox Code Playgroud)

我认为所有的代码都是线性的或尾递归的,我希望文件名的列表立即开始,但事实上并非如此.我的代码和思考中的错误在哪里？我在哪里丢失了懒惰的评价？

Answer 1

Gab*_*lez 7

我将使用三个单独的技巧来解决您的问题.

技巧1:使用pipes库在遍历树的同时流式传输文件名.
技巧2:使用StateT (Seq FilePath)变压器实现广度优先遍历.
技巧3:使用MaybeT变压器避免在写入循环和退出时手动递归.

以下代码将这三个技巧组合在一个monad变换器堆栈中.

import Control.Monad
import Control.Monad.Trans
import Control.Monad.Trans.Maybe
import Control.Monad.State.Lazy
import Control.Pipe
import Data.Sequence
import System.FilePath.Posix
import System.Directory

loop :: (Monad m) => MaybeT m a -> m ()
loop = liftM (maybe () id) . runMaybeT . forever

quit :: (Monad m) => MaybeT m a
quit = mzero

getUsefulContents :: FilePath -> IO [FilePath]
getUsefulContents path
  = fmap (filter (`notElem` [".", ".."])) $ getDirectoryContents path

permissible :: FilePath -> IO Bool
permissible file
  = fmap (\p -> readable p && searchable p) $ getPermissions file

traverseTree :: FilePath -> Producer FilePath IO ()
traverseTree path = (`evalStateT` empty) $ loop $ do
    -- All code past this point uses the following monad transformer stack:
    -- MaybeT (StateT (Seq FilePath) (Producer FilePath IO)) ()
    let liftState = lift
        liftPipe  = lift . lift
        liftIO    = lift . lift . lift
    liftState $ modify (|> path)
    forever $ do
        x <- liftState $ gets viewl
        case x of
            EmptyL    -> quit
            file :< s -> do
                liftState $ put s
                liftPipe $ yield file
                p <- liftIO $ doesDirectoryExist file
                when p $ do
                    names <- liftIO $ getUsefulContents file
                    -- allowedNames <- filterM permissible names
                    let namesfull = map (path </>) names
                    liftState $ forM_ namesfull $ \name -> modify (|> name)

Run Code Online (Sandbox Code Playgroud)

这将创建一个广度优先文件名的生成器,可以与树遍历同时使用.您使用以下方法使用值:

printer :: (Show a) => Consumer a IO r
printer = forever $ do
    a <- await
    lift $ print a

>>> runPipe $ printer <+< traverseTree path
<Prints file names as it traverses the tree>

Run Code Online (Sandbox Code Playgroud)

您甚至可以选择不要求所有值:

-- Demand only 'n' elements
take' :: (Monad m) => Int -> Pipe a a m ()
take' n = replicateM_ n $ do
    a <- await
    yield a

>> runPipe $ printer <+< take' 3 <+< traverseTree path
<Prints only three files>

Run Code Online (Sandbox Code Playgroud)

更重要的是,最后一个示例只会根据需要遍历树以生成三个文件然后它将停止.当你想要的只有3个结果时,这可以防止浪费地遍历整个树!

要了解更多关于pipes图书馆招,咨询管道教程的Control.Pipes.Tutorial.

要了解有关循环技巧的更多信息,请阅读此博客文章.

对于广度优先遍历的队列技巧,我找不到一个很好的链接,但我知道它在某处.如果其他人知道一个很好的链接,只需编辑我的答案添加它.

归档时间：	13 年，1 月前
查看次数：	624 次
最近记录：	11 年，10 月前