Haskell/Trifecta:解析完全可选的分号而不会污染AST

kva*_*nck 4 parsing haskell trifecta

我已经重写了这个问题,因为它最初是用更简洁的代码示例发布的:

考虑一种完全可选分号的语言,几乎完全是糖,即:

  • ;; foo; bar;;;; 已验证
  • foo bar foobar 已验证
  • if (+1); fooif (+1) foo语义不同,因此;不能被视为空格

这是一个示例解析器:

{-# LANGUAGE OverloadedStrings #-}

import Text.Trifecta
import Text.Trifecta.Delta
import Text.PrettyPrint.ANSI.Leijen (putDoc, (<>), linebreak)
import Control.Monad.Trans.State.Strict
import Control.Applicative

type TestParser a = StateT Int Parser a

data AST a = Foo a | Bar a deriving (Show)

pFoo :: TestParser (AST (Delta, Int))
pFoo = curry Foo <$ string "foo" <*> position <* modify (+1) <*> get

pBar :: TestParser (AST (Delta, Int))
pBar = curry Bar <$ string "bar" <*> position <*> get

pStmt :: TestParser (AST (Delta, Int))
pStmt = semi *> pStmt <|> pFoo <|> pBar <?> "statement"

pTest :: TestParser [AST (Delta, Int)]
pTest = some pStmt

main :: IO ()
main
 = do   let res = parseByteString (evalStateT pTest 0)
                    (Directed "(test)" 0 0 0 0) ";;foo;bar;\nfoo;; foobarbar;;"
        case res of
            Success ast
             -> print ast
            Failure errdoc
             -> putDoc (errdoc <> linebreak)
Run Code Online (Sandbox Code Playgroud)

我使用这样一个解析器的问题是我需要能够跳过分号而不提交解析pStmt.目前发生以下错误:

(test):2:18: error: unexpected
    EOF, expected: statement
foo;; foobarbar;;<EOF>
Run Code Online (Sandbox Code Playgroud)

这是因为它需要一个语句(in semi *> pStmt),但是因为堆叠的分号可以在表达式的开头和结尾都加糖,我不能确定我真的希望/解析一个在我已经预期之前.

我开发的一个hack是Nop作为我的AST中的构造函数,但我真的不想这样做 - 感觉就像一个hack,并且在某些文档中使用分号的数量会大大增加内存使用量.

我正在寻找解决方案/建议.


尝试所需语法的EBNF形式:

expr = "foo" | "bar"
expr with sugar = expr | ";"
program = { [white space], expr with sugar, [white space] }
Run Code Online (Sandbox Code Playgroud)

wit*_*wit 5

好的,这是:

pStmt = pFoo <|> pBar

pWhiteStmt = do
    many whitespace
    p <- pStmt
    many whitespace
    return p

pTest = do
    many semi
    pS <- sepEndBy pWhiteStm (some semi)
    eof
    return pS
Run Code Online (Sandbox Code Playgroud)

并测试它:

> parse pTest "" ";;foo;bar;\nfoo;; foo;bar;bar;;"
Right ["foo","bar","foo","foo","bar","bar"]

> parse pTest "" ";;foo;bar;\nfoo;; foobarbar;;"
Left (line 2, column 10):
unexpected 'b'
expecting ";" or end of input
Run Code Online (Sandbox Code Playgroud)

如果我们希望有一个有效的"; foobarbar;",那么我们需要将pWhiteStmt解析器更改为next:

pWhiteStmt = do
    many whitespace
    p <- some pStmt
    many whitespace
    return p
Run Code Online (Sandbox Code Playgroud)

并检查它:

> parse pTest "" ";;foo;bar;\nfoo;; foobarbar;;"
Right [["foo"],["bar"],["foo"],["foo","bar","bar"]]
Run Code Online (Sandbox Code Playgroud)

最后,如果我们仍希望有效,"; foo bar baz;"那么我们还需要将pTest函数更改为next:

pTest = do
    many semi
    pS <- sepEndBy (some pWhiteStm) (some semi)
    eof
    return pS
Run Code Online (Sandbox Code Playgroud)

并测试它

> parse pTest "" ";;foo;bar;\nfoo;; foo bar bar;;"
Right [[["foo"]],[["bar"]],[["foo"]],[["foo"],["bar"],["bar"]]]
Run Code Online (Sandbox Code Playgroud)

如果我们有很多括号,这是需要更换return p,以return (concat p)pWhiteStmt