Dan*_*lRS 5 parsing haskell parser-combinators attoparsec
我正在尝试解析一个可以包含转义字符的字符串,这是一个例子:
import qualified Data.Text as T
exampleParser :: Parser T.Text
exampleParser = T.pack <$> many (char '\\' *> escaped <|> anyChar)
where escaped = satisfy (\c -> c `elem` ['\\', '"', '[', ']'])
Run Code Online (Sandbox Code Playgroud)
上面的解析器创建一个String然后将其打包Text.是否有任何方法使用attoparsec提供的高效字符串处理函数解析带有类似上述转义的字符串?像string,scan,runScanner,takeWhile,...
解析类似的东西"one \"two\" \[three\]"会产生one "two" [three].
更新:
感谢@epsilonhalbe,我能够提出一个完美满足我需求的通用解决方案; 注意,以下功能不查找匹配转义字符像[..],"..",(..),等; 而且,如果它找到一个无效的转义字符,则将其视为\文字字符.
takeEscapedWhile :: (Char -> Bool) -> (Char -> Bool) -> Parser Text
takeEscapedWhile isEscapable while = do
x <- normal
xs <- many escaped
return $ T.concat (x:xs)
where normal = Atto.takeWhile (\c -> c /= '\\' && while c)
escaped = do
x <- (char '\\' *> satisfy isEscapable) <|> char '\\'
xs <- normal
return $ T.cons x xs
Run Code Online (Sandbox Code Playgroud)
可以编写一些转义代码,attoparsec而且text- 总的来说,这非常简单 - 看到你已经使用过解析器
import Data.Attoparsec.Text as AT
import qualified Data.Text as T
import Data.Text (Text)
escaped, quoted, brackted :: Parser Text
normal = AT.takeWhile (/= '\\')
escaped = do r <- normal
rs <- many escaped'
return $ T.concat $ r:rs
where escaped' = do r1 <- normal
r2 <- quoted <|> brackted
return $ r1 <> r2
quoted = do string "\\\""
res <- normal
string "\\\""
return $ "\""<>res <>"\""
brackted = do string "\\["
res <- normal
string "\\]"
return $ "["<>res<>"]"
Run Code Online (Sandbox Code Playgroud)
那么你可以用它来解析以下测试用例
Prelude >: MyModule
Prelude MyModule> import Data.Attoparsec.Text as AT
Prelude MyModule AT> import Data.Text.IO as TIO
Prelude MyModule AT TIO>:set -XOverloadedStrings
Prelude MyModule AT TIO> TIO.putStrLn $ parseOnly escaped "test"
test
Prelude MyModule AT TIO> TIO.putStrLn $ parseOnly escaped "\\\"test\\\""
"test"
Prelude MyModule AT TIO> TIO.putStrLn $ parseOnly escaped "\\[test\\]"
[test]
Prelude MyModule AT TIO> TIO.putStrLn $ parseOnly escaped "test \\\"test\\\" \\[test\\]"
test "test" [test]
Run Code Online (Sandbox Code Playgroud)
请注意,您必须逃避逃脱 - 这就是为什么您看到\\\"而不是\"
另外,如果您只是解析它,它将打印Text转义的值,例如
Right "test \"text\" [test]"
Run Code Online (Sandbox Code Playgroud)
对于最后一个例子。
如果您解析文件,您可以在文件中写入简单的转义文本。
test.txt
I \[like\] \"Haskell\"
Run Code Online (Sandbox Code Playgroud)
然后你可以
Prelude MyModule AT TIO> file <- TIO.readFile "test.txt"
Prelude MyModule AT TIO> TIO.putStrLn $ parseOnly escaped file
I [like] "Haskell"
Run Code Online (Sandbox Code Playgroud)