Scala RegexParsers中的非贪婪匹配

Mag*_*nus 14 regex scala parser-generator non-greedy

假设我在Scala中编写了一个基本的SQL解析器.我有以下内容:

class Arith extends RegexParsers {
    def selectstatement: Parser[Any] = selectclause ~ fromclause
    def selectclause: Parser[Any] = "(?i)SELECT".r ~ tokens
    def fromclause: Parser[Any] = "(?i)FROM".r ~ tokens
    def tokens: Parser[Any] = rep(token) //how to make this non-greedy?
    def token: Parser[Any] = "(\\s*)\\w+(\\s*)".r
}
Run Code Online (Sandbox Code Playgroud)

在尝试匹配SELECT foo FROM barselect 语句时,如何防止selectclause因为rep(token)in 而吞噬整个短语~ tokens

换句话说,如何在Scala中指定非贪婪匹配?

为了澄清,我完全知道我可以在String模式本身中使用标准的非贪婪语法(*?)或(+?),但我想知道是否有一种方法可以在def标记内的更高级别指定它.例如,如果我已经定义了这样的标记:

def token: Parser[Any] = stringliteral | numericliteral | columnname
Run Code Online (Sandbox Code Playgroud)

那么如何为def标记内的rep(标记)指定非贪婪匹配?

Dan*_*ral 12

不容易,因为没有重试成功的比赛.例如,考虑一下:

object X extends RegexParsers {
  def p = ("a" | "aa" | "aaa" | "aaaa") ~ "ab"
}

scala> X.parseAll(X.p, "aaaab")
res1: X.ParseResult[X.~[String,String]] = 
[1.2] failure: `ab' expected but `a' found

aaaab
 ^
Run Code Online (Sandbox Code Playgroud)

第一个匹配成功,在括号内的解析器中,所以它继续到下一个.那个失败了,所以p失败了.如果p是替代匹配的一部分,那么将尝试替代方案,因此诀窍是生成可以处理这种事情的东西.

假设我们有这个:

def nonGreedy[T](rep: => Parser[T], terminal: => Parser[T]) = Parser { in =>
  def recurse(in: Input, elems: List[T]): ParseResult[List[T] ~ T] =
    terminal(in) match {
      case Success(x, rest) => Success(new ~(elems.reverse, x), rest)
      case _ => 
        rep(in) match {
          case Success(x, rest) => recurse(rest, x :: elems)
          case ns: NoSuccess    => ns
        }
    }

  recurse(in, Nil)
}  
Run Code Online (Sandbox Code Playgroud)

然后你可以像这样使用它:

def p = nonGreedy("a", "ab")
Run Code Online (Sandbox Code Playgroud)

顺便说一句,我总是发现,看看其他事情的定义是有助于尝试提出nonGreedy上述内容.特别是,看看如何rep1定义,以及如何更改以避免重新评估其重复参数 - 同样的事情可能会有用nonGreedy.

这是一个完整的解决方案,稍加改动以避免消耗"终端".

trait NonGreedy extends Parsers {
    def nonGreedy[T, U](rep: => Parser[T], terminal: => Parser[U]) = Parser { in =>
      def recurse(in: Input, elems: List[T]): ParseResult[List[T]] =
        terminal(in) match {
          case _: Success[_] => Success(elems.reverse, in)
          case _ => 
            rep(in) match {
              case Success(x, rest) => recurse(rest, x :: elems)
              case ns: NoSuccess    => ns
            }
        }

      recurse(in, Nil)
    }  
}

class Arith extends RegexParsers with NonGreedy {
    // Just to avoid recompiling the pattern each time
    val select: Parser[String] = "(?i)SELECT".r
    val from: Parser[String] = "(?i)FROM".r
    val token: Parser[String] = "(\\s*)\\w+(\\s*)".r
    val eof: Parser[String] = """\z""".r

    def selectstatement: Parser[Any] = selectclause(from) ~ fromclause(eof)
    def selectclause(terminal: Parser[Any]): Parser[Any] = 
      select ~ tokens(terminal)
    def fromclause(terminal: Parser[Any]): Parser[Any] = 
      from ~ tokens(terminal)
    def tokens(terminal: Parser[Any]): Parser[Any] = 
      nonGreedy(token, terminal)
}
Run Code Online (Sandbox Code Playgroud)