我尝试遍历目录树.天真的深度优先遍历似乎不会以懒惰的方式生成数据并且耗尽内存.我接下来尝试了广度优先的方法,它显示了同样的问题 - 它使用了所有可用的内存然后崩溃.
我的代码是:
getFilePathBreadtFirst :: FilePath -> IO [FilePath]
getFilePathBreadtFirst fp = do
fileinfo <- getInfo fp
res :: [FilePath] <- if isReadableDirectory fileinfo
then do
children <- getChildren fp
lower <- mapM getFilePathBreadtFirst children
return (children ++ concat lower)
else return [fp] -- should only return the files?
return res
getChildren :: FilePath -> IO [FilePath]
getChildren path = do
names <- getUsefulContents path
let namesfull = map (path </>) names
return namesfull
testBF fn = do -- crashes for /home/frank, does not go to swap
fps <- getFilePathBreadtFirst fn
putStrLn $ unlines fps
Run Code Online (Sandbox Code Playgroud)
我认为所有的代码都是线性的或尾递归的,我希望文件名的列表立即开始,但事实上并非如此.我的代码和思考中的错误在哪里?我在哪里丢失了懒惰的评价?
我将使用三个单独的技巧来解决您的问题.
pipes库在遍历树的同时流式传输文件名.StateT (Seq FilePath)变压器实现广度优先遍历.MaybeT变压器避免在写入循环和退出时手动递归.以下代码将这三个技巧组合在一个monad变换器堆栈中.
import Control.Monad
import Control.Monad.Trans
import Control.Monad.Trans.Maybe
import Control.Monad.State.Lazy
import Control.Pipe
import Data.Sequence
import System.FilePath.Posix
import System.Directory
loop :: (Monad m) => MaybeT m a -> m ()
loop = liftM (maybe () id) . runMaybeT . forever
quit :: (Monad m) => MaybeT m a
quit = mzero
getUsefulContents :: FilePath -> IO [FilePath]
getUsefulContents path
= fmap (filter (`notElem` [".", ".."])) $ getDirectoryContents path
permissible :: FilePath -> IO Bool
permissible file
= fmap (\p -> readable p && searchable p) $ getPermissions file
traverseTree :: FilePath -> Producer FilePath IO ()
traverseTree path = (`evalStateT` empty) $ loop $ do
-- All code past this point uses the following monad transformer stack:
-- MaybeT (StateT (Seq FilePath) (Producer FilePath IO)) ()
let liftState = lift
liftPipe = lift . lift
liftIO = lift . lift . lift
liftState $ modify (|> path)
forever $ do
x <- liftState $ gets viewl
case x of
EmptyL -> quit
file :< s -> do
liftState $ put s
liftPipe $ yield file
p <- liftIO $ doesDirectoryExist file
when p $ do
names <- liftIO $ getUsefulContents file
-- allowedNames <- filterM permissible names
let namesfull = map (path </>) names
liftState $ forM_ namesfull $ \name -> modify (|> name)
Run Code Online (Sandbox Code Playgroud)
这将创建一个广度优先文件名的生成器,可以与树遍历同时使用.您使用以下方法使用值:
printer :: (Show a) => Consumer a IO r
printer = forever $ do
a <- await
lift $ print a
>>> runPipe $ printer <+< traverseTree path
<Prints file names as it traverses the tree>
Run Code Online (Sandbox Code Playgroud)
您甚至可以选择不要求所有值:
-- Demand only 'n' elements
take' :: (Monad m) => Int -> Pipe a a m ()
take' n = replicateM_ n $ do
a <- await
yield a
>> runPipe $ printer <+< take' 3 <+< traverseTree path
<Prints only three files>
Run Code Online (Sandbox Code Playgroud)
更重要的是,最后一个示例只会根据需要遍历树以生成三个文件然后它将停止.当你想要的只有3个结果时,这可以防止浪费地遍历整个树!
要了解更多关于pipes图书馆招,咨询管道教程的Control.Pipes.Tutorial.
要了解有关循环技巧的更多信息,请阅读此博客文章.
对于广度优先遍历的队列技巧,我找不到一个很好的链接,但我知道它在某处.如果其他人知道一个很好的链接,只需编辑我的答案添加它.