Antlr4丢弃剩余的令牌而不是拯救

nyb*_*bon 4 error-handling antlr4

我正在使用Antlr4,这是我写的简化语法:

grammar BooleanExpression;

/*******************************
 *      Parser Rules
 *******************************/
booleanTerm
    : booleanLiteral (KW_OR booleanLiteral)+
    | booleanLiteral
    ;

id
    : IDENTIFIER
    ;

booleanLiteral
    : KW_TRUE
    | KW_FALSE
    ;

/*******************************
 *         Lexer Rules
 *******************************/
KW_TRUE
    : 'true'
    ;

KW_FALSE
    : 'false'
    ;

KW_OR
    : 'or'
    ;   

IDENTIFIER
    : (SIMPLE_LATIN)+
    ;

fragment 
SIMPLE_LATIN
    : 'A' .. 'Z'
    | 'a' .. 'z'
    ;

WHITESPACE
    : [ \t\n\r]+ -> skip
    ;
Run Code Online (Sandbox Code Playgroud)

我使用了BailErrorStategy和BailLexer,如下所示:

public class BailErrorStrategy extends DefaultErrorStrategy {
    /**
     * Instead of recovering from exception e, rethrow it wrapped in a generic
     * IllegalArgumentException so it is not caught by the rule function catches.
     * Exception e is the "cause" of the IllegalArgumentException.
     */

    @Override
    public void recover(Parser recognizer, RecognitionException e) {
        throw new IllegalArgumentException(e);
    }

    /**
     * Make sure we don't attempt to recover inline; if the parser successfully
     * recovers, it won't throw an exception.
     */
    @Override
    public Token recoverInline(Parser recognizer) throws RecognitionException {
        throw new IllegalArgumentException(new InputMismatchException(recognizer));
    }

    /** Make sure we don't attempt to recover from problems in subrules. */
    @Override
    public void sync(Parser recognizer) {
    }

    @Override
    protected Token getMissingSymbol(Parser recognizer) {
        throw new IllegalArgumentException(new InputMismatchException(recognizer));
    }
}



 public class BailLexer extends BooleanExpressionLexer {
    public BailLexer(CharStream input) {
        super(input);
        //removeErrorListeners();
        //addErrorListener(new ConsoleErrorListener());
    }

    @Override
    public void recover(LexerNoViableAltException e) {
        throw new IllegalArgumentException(e); // Bail out
    }

    @Override
    public void recover(RecognitionException re) {
        throw new IllegalArgumentException(re); // Bail out
    }
}
Run Code Online (Sandbox Code Playgroud)

除了一个案例,一切正常.我尝试了以下表达式:

true OR false
Run Code Online (Sandbox Code Playgroud)

我希望这个表达式被拒绝并抛出IllegalArgumentException,因为'或'标记应该是小写而不是大写.但事实证明Antlr4没有拒绝这个表达式,并且表达式被标记为"KW_TRUE IDENTIFIER KW_FALSE"(这是预期的,大写'OR'将被视为IDENTIFIER),但解析器没有抛出错误处理此令牌流并将其解析为仅包含"true"的树并丢弃剩余的"IDENTIFIER KW_FALSE"令牌.我尝试了不同的预测模式,但它们都像上面一样工作.我不知道它为什么会像这样工作并进行一些调试,最终导致Antlr中的这段代码:

ATNConfigSet reach = computeReachSet(previous, t, false);

if ( reach==null ) {
    // if any configs in previous dipped into outer context, that
    // means that input up to t actually finished entry rule
    // at least for SLL decision. Full LL doesn't dip into outer
    // so don't need special case.
    // We will get an error no matter what so delay until after
    // decision; better error message. Also, no reachable target
    // ATN states in SLL implies LL will also get nowhere.
    // If conflict in states that dip out, choose min since we
    // will get error no matter what.
    int alt = getAltThatFinishedDecisionEntryRule(previousD.configs);
    if ( alt!=ATN.INVALID_ALT_NUMBER ) {
        // return w/o altering DFA
        return alt;
    }
    throw noViableAlt(input, outerContext, previous, startIndex);
}  
Run Code Online (Sandbox Code Playgroud)

代码"int alt = getAltThatFinishedDecisionEntryRule(previousD.configs);" 返回booleanTerm中的第二个替换(因为"true"匹配第二个替代"booleanLiteral")但由于它不等于ATN.INVALID_ALT_NUMBER,因此不会立即抛出noViableAlt.根据那里的Java评论,"我们将得到一个错误,无论如何,所以延迟到决定后"但似乎最终没有错误被抛出.

在这种情况下,我真的不知道如何让Antlr报告错误,有人可以解释一下吗?任何帮助表示赞赏,谢谢.

Sam*_*ell 5

如果您的顶级规则不以显式结束EOF,则不需要ANTLR解析到输入序列的末尾.它不是抛出异常,而是简单地解析你给它的序列的有效部分.

以下start规则将强制它将整个输入序列解析为单个booleanTerm.

start : booleanTerm EOF;
Run Code Online (Sandbox Code Playgroud)

此外,BailErrorStrategy由ANTLR 4运行时提供,并提供ParseCancellationException比示例中显示的信息更丰富的信息.

  • @ 280Z28嘿,我遇到了同样的问题.但我的问题是,有时我只需要输入目标子规则的内容来解析子规则(而不是起始规则).解析器还丢弃剩余的令牌.我怎么解决这个问题?因为无法为所有子规则添加"EOF". (2认同)