在Attoparsec中使用sepBy字符串

Question

在Attoparsec中使用sepBy字符串

我试图通过任何一个字符串分隔",",", and"并且"and",然后返回无论是之间英寸我到目前为止的一个例子如下:

import Data.Attoparsec.Text

sepTestParser = nameSep ((takeWhile1 $ inClass "-'a-zA-Z") <* space)
nameSep p = p `sepBy` (string " and " <|> string ", and" <|> ", ")

main = do
  print $ parseOnly sepTestParser "This test and that test, this test particularly."

Run Code Online (Sandbox Code Playgroud)

我希望输出["This test", "that test", "this test particularly."].我有一种模糊的感觉,我正在做的事情是错的,但我无法理解为什么.

Answer 1

Zet*_*eta 4

^{注意：这个答案是用Haskell语言写的。将其另存为Example.lhs并加载到 GHCi 或类似文件中。}

事情是，sepBy实现为：

sepBy p s = liftA2 (:) p ((s *> sepBy1 p s) <|> pure []) <|> pure []

Run Code Online (Sandbox Code Playgroud)

这意味着第一个解析器成功后将s调用第二个解析器。这也意味着，如果您要向字符类添加空格，那么您最终会得到

["This test and that test","this test particularly"]
Run Code Online (Sandbox Code Playgroud)
因为and现在可以由解析p。这并不容易解决：一旦遇到空格，您就需要向前看，并检查在任意数量的空格之后是否有“and”跟随，如果有，则停止解析。只有这样，用编写的解析器sepBy才能工作。

因此，让我们编写一个解析器来代替单词（这个答案的其余部分是有文化的 Haskell）：

> {-# LANGUAGE OverloadedStrings #-} > import Control.Applicative > import Data.Attoparsec.Text > import qualified Data.Text as T > import Control.Monad (mzero) > word = takeWhile1 . inClass $ "-'a-zA-Z" > > wordsP = fmap (T.intercalate " ") $ k `sepBy` many space > where k = do > a <- word > if (a == "and") then mzero > else return a
Run Code Online (Sandbox Code Playgroud)
wordsP现在需要多个单词，直到它碰到某个东西，那不是一个单词，或者是一个等于“and”的单词。返回的结果mzero将指示解析失败，此时另一个解析器可以接管：

> andP = many space *> "and" *> many1 space *> pure() > > limiter = choice [ > "," *> andP, > "," *> many1 space *> pure (), > andP > ]
Run Code Online (Sandbox Code Playgroud)
limiter与您已经编写的解析器基本相同，它与 regex 相同/,\s+and|,\s+|\s*and\s+/。

现在我们实际上可以使用sepBy，因为我们的第一个解析器不再与第二个解析器重叠：

> test = "This test and that test, this test particular, and even that test" > > main = print $ parseOnly (wordsP `sepBy` limiter) test
Run Code Online (Sandbox Code Playgroud)
结果就是["This test","that test","this test particular","even that test"]，正如我们所希望的那样。请注意，这个特定的解析器不保留空格。

因此，每当您想使用来创建解析器时sepBy，请确保两个解析器不重叠。

归档时间：	11 年，8 月前
查看次数：	597 次
最近记录：	9 年，10 月前