Xod*_*rap 8 grammar parsing haskell parsec context-free-grammar
我正在尝试通过实现一个小的正则表达式解析器来学习Parsec.在BNF中,我的语法看起来像:
EXP : EXP *
| LIT EXP
| LIT
Run Code Online (Sandbox Code Playgroud)
我试图在Haskell中实现这个:
expr = try star
<|> try litE
<|> lit
litE = do c <- noneOf "*"
rest <- expr
return (c : rest)
lit = do c <- noneOf "*"
return [c]
star = do content <- expr
char '*'
return (content ++ "*")
Run Code Online (Sandbox Code Playgroud)
这里有一些无限循环(例如expr - > star - > expr,不消耗任何标记),这使得解析器永远循环.我不确定如何修复它,因为它的本质star是它最终会消耗它的强制令牌.
有什么想法吗?
pat*_*pat 12
你应该使用Parsec.Expr.buildExprParser; 它非常适合这个目的.您只需描述您的运算符,它们的优先级和关联性,以及如何解析原子,组合器为您构建解析器!
您可能还希望添加使用parens对术语进行分组的功能,以便您可以应用于*多个文字.
这是我尝试(我扔在|,+和?好措施):
import Control.Applicative
import Control.Monad
import Text.ParserCombinators.Parsec
import Text.ParserCombinators.Parsec.Expr
data Term = Literal Char
| Sequence [Term]
| Repeat (Int, Maybe Int) Term
| Choice [Term]
deriving ( Show )
term :: Parser Term
term = buildExpressionParser ops atom where
ops = [ [ Postfix (Repeat (0, Nothing) <$ char '*')
, Postfix (Repeat (1, Nothing) <$ char '+')
, Postfix (Repeat (0, Just 1) <$ char '?')
]
, [ Infix (return sequence) AssocRight
]
, [ Infix (choice <$ char '|') AssocRight
]
]
atom = msum [ Literal <$> lit
, parens term
]
lit = noneOf "*+?|()"
sequence a b = Sequence $ (seqTerms a) ++ (seqTerms b)
choice a b = Choice $ (choiceTerms a) ++ (choiceTerms b)
parens = between (char '(') (char ')')
seqTerms (Sequence ts) = ts
seqTerms t = [t]
choiceTerms (Choice ts) = ts
choiceTerms t = [t]
main = parseTest term "he(llo)*|wor+ld?"
Run Code Online (Sandbox Code Playgroud)
你的语法是左递归的,不能很好用try,因为Parsec会反复回溯.有几种方法可以解决这个问题.可能最简单的就是*在另一条规则中制作可选项:
lit :: Parser (Char, Maybe Char)
lit = do
c <- noneOf "*"
s <- optionMaybe $ char '*'
return (c, s)
Run Code Online (Sandbox Code Playgroud)
当然,无论如何,你最终可能会将数据类型包装起来,并且有很多方法可以解决这个问题.这是我的头脑中的一个:
import Control.Applicative ((<$>))
data Term = Literal Char
| Sequence [Term]
| Star Term
expr :: Parser Term
expr = Sequence <$> many term
term :: Parser Term
term = do
c <- lit
s <- optionMaybe $ char '*' -- Easily extended for +, ?, etc.
return $ if isNothing s
then Literal c
else Star $ Literal c
Run Code Online (Sandbox Code Playgroud)
也许更有经验的Haskeller会提供更好的解决方案.