Sib*_*ibi 5 parsing haskell attoparsec conduit haskell-pipes
我使用以下方法编写了以下解析代码attoparsec
:
data Test = Test {
a :: Int,
b :: Int
} deriving (Show)
testParser :: Parser Test
testParser = do
a <- decimal
tab
b <- decimal
return $ Test a b
tParser :: Parser [Test]
tParser = many' $ testParser <* endOfLine
Run Code Online (Sandbox Code Playgroud)
这适用于小型文件,我这样执行:
main :: IO ()
main = do
text <- TL.readFile "./testFile"
let (Right a) = parseOnly (manyTill anyChar endOfLine *> tParser) text
print a
Run Code Online (Sandbox Code Playgroud)
但是当文件大小超过70MB时,它会占用大量内存.作为解决方案,我想我会用attoparsec-conduit
.在完成他们的API之后,我不确定如何让它们一起工作.我的解析器有类型,Parser Test
但它sinkParser
实际上接受类型的解析器Parser a b
.我对如何在常量内存中执行此解析器感兴趣?(基于管道的解决方案也是可以接受的,但我不习惯Pipes API.)
第一种类型的参数Parser
是输入的(或者仅仅是数据类型Text
或ByteString
).你可以提供你的testParser
函数作为参数sinkParser
,它将正常工作.这是一个简短的例子:
{-# LANGUAGE OverloadedStrings #-}
import Conduit (liftIO, mapM_C, runResourceT,
sourceFile, ($$), (=$))
import Data.Attoparsec.Text (Parser, decimal, endOfLine, space)
import Data.Conduit.Attoparsec (conduitParser)
data Test = Test {
a :: Int,
b :: Int
} deriving (Show)
testParser :: Parser Test
testParser = do
a <- decimal
space
b <- decimal
endOfLine
return $ Test a b
main :: IO ()
main = runResourceT
$ sourceFile "foo.txt"
$$ conduitParser testParser
=$ mapM_C (liftIO . print)
Run Code Online (Sandbox Code Playgroud)
这是pipes
解决方案(假设您使用的是Text
基于解析器):
import Pipes
import Pipes.Text.IO (fromHandle)
import Pipes.Attoparsec (parsed)
import qualified System.IO as IO
main = IO.withFile "./testfile" IO.ReadMode $ \handle -> runEffect $
for (parsed (testParser <* endOfLine) (fromHandle handle)) (lift . print)
Run Code Online (Sandbox Code Playgroud)