Parsec:特定位置的错误消息

Question

Parsec:特定位置的错误消息

如果违反语义规则,使用Parsec如何指示特定位置的错误.我知道通常我们不想做这些事情,但请考虑示例语法.

<foo> ::= <bar> | ...
<bar> ::= a positive integer power of two

Run Code Online (Sandbox Code Playgroud)

该<bar>规则是一组有限的(我的例子是任意的),和一个纯粹的方式对上述可能是一个细心的的应用程序choice组合子,但是这可能是在空间和时间是不切实际的.在递归下降或工具包生成的解析器中,标准技巧是解析整数(更宽松的语法),然后在语义上检查更难的约束.对于Parsec,我可以使用natural解析器并在fail不匹配时检查结果调用unexpected或者等等.但是,如果我们这样做,默认错误位置是错误的.不知何故,我需要在早期状态引发错误.

我尝试了蛮力解决方案,并写道,采用了组合子getPosition,并setPosition通过所示此非常类似的问题.当然,我也不成功(错误的位置当然是错误的).我已多次遇到这种模式.我正在寻找这种类型的组合器:

withPredicate :: (a -> Bool) -> String -> P a -> P a
withPredicate pred lbl p = do
  ok <- lookAhead $ fmap pred (try p) <|> return False -- peek ahead
  if ok then p         -- consume the input if the value passed the predicate
   else fail lbl       -- otherwise raise the error at the *start* of this token

pPowerOfTwo = withPredicate isPowerOfTwo "power of two" natural
  where isPowerOfTwo = (`elem` [2^i | i<-[1..20]])

Run Code Online (Sandbox Code Playgroud)

以上不起作用.(我也试过这个变种.)不知何故,解析器回溯了说它期待一个数字.我认为它正在返回使其最远的错误.甚至{get,set}ParserState无法擦掉那个记忆.

我处理这种句法模式错了吗？Parsec用户如何处理这些类型的问题？

谢谢!

Answer 1

Mar*_*ark 5

我认为您的两个想法都可以。其他两个答案与Parsec有关，但我想指出的是，在这两种情况下，Megaparsec都做对了：

{-# LANGUAGE TypeApplications #-}

module Main (main) where

import Control.Monad
import Data.Void
import Text.Megaparsec
import qualified Text.Megaparsec.Char.Lexer as L

type Parser = Parsec Void String

withPredicate1 :: (a -> Bool) -> String -> Parser a -> Parser a
withPredicate1 f msg p = do
  r <- lookAhead p
  if f r
    then p
    else fail msg

withPredicate2 :: (a -> Bool) -> String -> Parser a -> Parser a
withPredicate2 f msg p = do
  mpos <- getNextTokenPosition -- †
  r    <- p
  if f r
    then return r
    else do
      forM_ mpos setPosition
      fail msg

main :: IO ()
main = do
  let msg = "I only like numbers greater than 42!"
  parseTest' (withPredicate1 @Integer (> 42) msg L.decimal) "11"
  parseTest' (withPredicate2 @Integer (> 42) msg L.decimal) "22"

Run Code Online (Sandbox Code Playgroud)

如果我运行它：

The next big Haskell project is about to start!
?> :main
1:1:
  |
1 | 11
  | ^
I only like numbers greater than 42!
1:1:
  |
1 | 22
  | ^
I only like numbers greater than 42!
?>

Run Code Online (Sandbox Code Playgroud)

自己尝试一下！可以正常工作。

† getNextTokenPosition比getPosition令牌包含其开始和结束位置的流更正确。这对您而言可能重要，也可能不重要。

Answer 2

Tim*_*Tim 3

我认为问题源于秒差距如何在非确定性设置中选择“最佳错误”。请参阅Text.Parsec.Error.mergeError。具体来说，在选择要报告的错误时，会选择最长的匹配。我认为我们需要某种方法来使秒差距顺序错误变得不同，这对于我们解决这个问题来说可能太晦涩了。

就我而言，我的解决方法如下：

我解决了在 ParsecT 类型中堆叠异常单子的问题。

type P m = P.ParsecT String ParSt (ExceptT Diagnostic m)

Run Code Online (Sandbox Code Playgroud)

然后我引入了一对组合器：（注：Loc是我的内部位置类型）

-- stops hard on an error (no backtracking)
-- which is why I say "semantic" instead of "syntax" error
throwSemanticError :: (MonadTrans t, Monad m) => Loc -> String -> t (ExceptT Diagnostic m) a
throwSemanticError loc msg = throwSemanticErrorDiag $! Diagnostic loc msg


withLoc :: Monad m => (Loc -> P m a) -> P m a
withLoc pa = getLoc >>= pa

Run Code Online (Sandbox Code Playgroud)

现在在解析中我可以写：

parsePrimeNumber = withLoc $ \loc ->
  i <- parseInt
  unless (isPrime i) $ throwSemanticError loc "number is not prime!"
  return i

Run Code Online (Sandbox Code Playgroud)

运行这些 monad 之一的顶级接口确实很糟糕。

runP :: Monad m
    => ParseOpts
    -> P m a
    -> String
    -> m (ParseResult a)
runP pos pma inp = 
  case runExceptT (P.runParserT pma (initPSt pos) "" inp) of
    mea -> do
             ea <- mea
             case ea of
               -- semantic error (throwSemanticError)
               Left err -> return $! PError err
               -- regular parse error
               Right (Left err) -> return $ PError (errToDiag err)
               -- success
               Right (Right a) -> return (PSuccess a [])

Run Code Online (Sandbox Code Playgroud)

我对这个解决方案不太满意，并且希望有更好的解决方案。

我希望秒差距有一个：

semanticCheck :: (a -> Parsec Bool) -> Parsec a -> Parsec a
semanticCheck pred p = 
    a <- p
    z <- pred a
    unless z $
       ... somehow raise the error from the beginning of this token/parse 
       rather than the end ... and when propagating the error up, 
      use the end parse position, so this parse error beats out other 
      failed parsers that make it past the beginning of this token 
      (but not to the end)
    return a

Run Code Online (Sandbox Code Playgroud)

归档时间：	11 年，5 月前
查看次数：	392 次
最近记录：	6 年，4 月前