词法分析器中的 ANTLR4 负前瞻

val*_*gog 3 antlr4

我正在尝试为 PostgreSQL SQL 定义词法分析器规则。

问题在于运算符定义和行注释相互冲突。

例如@---是一个操作符标记@-后跟--注释而不是一个操作符标记@---

grako有可能定义为负先行-样片段:

OP_MINUS: '-' ! ( '-' ) .
Run Code Online (Sandbox Code Playgroud)

在 ANTLR4 中,我找不到任何方法来回滚已经消耗的片段。

有任何想法吗?

这里的原始定义是什么 PostgreSQL 操作符可以是:

The operator name is a sequence of up to NAMEDATALEN-1
(63 by default) characters from the following list:

 + - * / < > = ~ ! @ # % ^ & | ` ?

There are a few restrictions on your choice of name:
-- and /* cannot appear anywhere in an operator name,
since they will be taken as the start of a comment.

A multicharacter operator name cannot end in + or -,
unless the name also contains at least one of these
characters:

~ ! @ # % ^ & | ` ?

For example, @- is an allowed operator name, but *- is not.
This restriction allows PostgreSQL to parse SQL-compliant
commands without requiring spaces between tokens.
Run Code Online (Sandbox Code Playgroud)

Sam*_*ell 5

您可以在词法分析器规则中使用语义谓词来执行前瞻(或落后)而不消耗字符。例如,以下内容涵盖了运算符的几个规则。

OPERATOR
  : ( [+*<>=~!@#%^&|`?]
    | '-' {_input.LA(1) != '-'}?
    | '/' {_input.LA(1) != '*'}?
    )+
  ;
Run Code Online (Sandbox Code Playgroud)

但是,上述规则并未解决在运算符末尾包含+或的限制-。为了以最简单的方式处理这个问题,我可能会将这两种情况分成不同的规则。

// this rule does not allow + or - at the end of a rule
OPERATOR
  : ( [*<>=~!@#%^&|`?]
    | ( '+'
      | '-' {_input.LA(1) != '-'}?
      )+
      [*<>=~!@#%^&|`?]
    | '/' {_input.LA(1) != '*'}?
    )+
  ;

// this rule allows + or - at the end of a rule and sets the type to OPERATOR
// it requires a character from the special subset to appear
OPERATOR2
  : ( [*<>=+]
    | '-' {_input.LA(1) != '-'}?
    | '/' {_input.LA(1) != '*'}?
    )*
    [~!@#%^&|`?]
    OPERATOR?
    ( '+'
    | '-' {_input.LA(1) != '-'}?
    )+
    -> type(OPERATOR)
  ;
Run Code Online (Sandbox Code Playgroud)