Octal digit in ANSI C grammar (lex)

TYF*_*YFA 4 c regex grammar lex

I looked ANSI C grammar (lex).

And this is octal digit regex

0{D}+{IS}?      { count(); return(CONSTANT); }
Run Code Online (Sandbox Code Playgroud)

My question is why do they accept something like 0898?

It's not an octal digit.

So i thought they would consider that, but they just have wrote like that.

Could you explain why is that? Thank you

n. *_* m. 5

You want reasonable, user-friendly error messages.

If your lexer accepts 0999, you can detect an illegal octal digit and output a reasonable message:

 int x = 0999;
          ^
 error: illegal octal digit, go back to school
Run Code Online (Sandbox Code Playgroud)

If it doesn't, it will parse this as two separate tokens 0 and 999 and pass them to the parser. The resulting error messages could be quite confusing.

 int x = 0999;
          ^
 error: expected ‘,’ or ‘;’ before numeric constant
Run Code Online (Sandbox Code Playgroud)

The invalid program is rejected either way, as it should, however the ostensibly incorrect lex grammar does a better job with error reporting.

This demonstrates that practical grammars built for tools such as lex or yacc do not have to correspond exactly to ideal grammars found in language definitions.