使用ANTLR解析日志文件

RC.*_*RC. 4 antlr

我只是从ANTLR开始并尝试从日志文件中解析一些模式

例如:日志文件:

7114422 2009-07-16 15:43:07,078 [LOGTHREAD] INFO StatusLog - 任务0输入:uk.project.Evaluation.Input.Function1(selected = ["red","yellow"]){}

7114437 2009-07-16 15:43:07,093 [LOGTHREAD] INFO StatusLog - 任务0输出:uk.org.project.Evaluation.Output.Function2(selected = ["Rocket"]){}

7114422 2009-07-16 15:43:07,078 [LOGTHREAD] INFO StatusLog - 任务0输入:uk.project.Evaluation.Input.Function3(selected = ["blue","yellow"]){}

7114437 2009-07-16 15:43:07,093 [LOGTHREAD] INFO StatusLog - 任务0输出:uk.org.project.Evaluation.Output.Function4(selected = ["Speech"]){}

现在我必须解析这个文件,只找到'Evaluation.Input.Function1',它的值为'red'和'yellow','Evaluation.Output.Function2'和值'Rocket'并忽略其他所有内容,同样地忽略其他2个输入和输出功能3,4以下.有许多这样的输入和输出功能,我必须找到这样的输入/输出功能集.这是我尝试的语法无法正常工作.任何帮助,将不胜感激.作为我第一次尝试编写语法和ANTLR,它现在变得非常艰巨.

grammar test;

    tag : inputtag+ outputtag+ ;
//Input tag consists of atleast one inputfunction with one or more values
inputtag:  INPUTFUNCTIONS INPUTVALUES+;

//output tag consists of atleast one ontput function with one or more output values
outputtag : OUTPUTFUNCTIONS OUTPUTVALUES+;

INPUTFUNCTIONS 
 : INFUNCTION1 | INFUNCTION2;

OUTPUTFUNCTIONS
 :OUTFUNCTION1 | OUTFUNCTION2;

// Possible input functions in the log file
fragment INFUNCTION1
 :'Evaluation.Input.Function1';

fragment INFUNCTION2
 :'Evaluation.Input.Function3';

//Possible values in the input functions
INPUTVALUES
 : 'red' | 'yellow' | 'blue';

// Possible output functions in the log file 
fragment OUTFUNCTION1
 :'Evaluation.Output.Function2';

fragment OUTFUNCTION2
 :'Evaluation.Output.Function4';

//Possible ouput values in the output functions
fragment OUTPUTVALUES
 : 'Rocket' | 'Speech';
Run Code Online (Sandbox Code Playgroud)

Bar*_*ers 7

如果您只对正在解析的文件的一部分感兴趣,则不需要解析器并为整个文件格式编写语法.只有lexer-grammar和ANTLR才options{filter=true;}足够.这样,您将只获取您在语法中定义的标记,并忽略文件的其余部分.

这是一个快速演示:

lexer grammar TestLexer;

options{filter=true;}

@lexer::members {
  public static void main(String[] args) throws Exception {
    String text = 
        "7114422 2009-07-16 15:43:07,078 [LOGTHREAD] INFO StatusLog - Task 0 input : uk.project.Evaluation.Input.Function1(selected=[\"red\",\"yellow\"]){}\n"+
        "\n"+
        "7114437 2009-07-16 15:43:07,093 [LOGTHREAD] INFO StatusLog - Task 0 output : uk.org.project.Evaluation.Output.Function2(selected=[\"Rocket\"]){}\n"+
        "\n"+
        "7114422 2009-07-16 15:43:07,078 [LOGTHREAD] INFO StatusLog - Task 0 input : uk.project.Evaluation.Input.Function3(selected=[\"blue\",\"yellow\"]){}\n"+
        "\n"+
        "7114437 2009-07-16 15:43:07,093 [LOGTHREAD] INFO StatusLog - Task 0 output : uk.org.project.Evaluation.Output.Function4(selected=[\"Speech\"]){}";
    ANTLRStringStream in = new ANTLRStringStream(text);
    TestLexer lexer = new TestLexer(in);
    CommonTokenStream tokens = new CommonTokenStream(lexer);
    for(Object obj : tokens.getTokens()) {
        Token token = (Token)obj;
        System.out.println("> token.getText() = "+token.getText());
    }
  }
}

Input
  :  'Evaluation.Input.Function' '0'..'9'+ Params   
  ;

Output
  :  'Evaluation.Output.Function' '0'..'9'+ Params
  ;

fragment
Params
  :  '(selected=[' String ( ',' String )* '])'
  ;

fragment
String
  :  '"' ( ~'"' )* '"'
  ;
Run Code Online (Sandbox Code Playgroud)

现在做:

javac -cp antlr-3.2.jar TestLexer.java
java -cp .:antlr-3.2.jar TestLexer // or on Windows: java -cp .;antlr-3.2.jar TestLexer
Run Code Online (Sandbox Code Playgroud)

并且您将看到以下内容被打印到控制台:

> token.getText() = Evaluation.Input.Function1(selected=["red","yellow"])
> token.getText() = Evaluation.Output.Function2(selected=["Rocket"])
> token.getText() = Evaluation.Input.Function3(selected=["blue","yellow"])
> token.getText() = Evaluation.Output.Function4(selected=["Speech"])
Run Code Online (Sandbox Code Playgroud)

  • @arkilus,如果"它"不起作用,那就不是一个很好的例子,我会假设......它确实*有效.事实上,它不适合你是一个完全不同的故事:) (2认同)