无法在perl6中编写语法来解析具有特殊字符的行

Rav*_*ell 6 regex grammar parsing perl6 raku

我有以下代码:https://gist.github.com/ravbell/d94b37f1a346a1f73b5a827d9eaf7c92

use v6;
#use Grammar::Tracer;


grammar invoice {

    token ws { \h*};
    token super-word {\S+};
    token super-phrase { <super-word> [\h  <super-word>]*}
    token line {^^ \h* [ <super-word> \h+]* <super-word>* \n};

    token invoice-prelude-start {^^'Invoice Summary'\n}
    token invoice-prelude-end {<line> <?before 'Start Invoice Details'\n>};

    rule invoice-prelude {
        <invoice-prelude-start>
        <line>*?
        <invoice-prelude-end>
        <line>
    }
}

multi sub MAIN(){ 

    my $t = q :to/EOQ/; 
    Invoice Summary
    asd fasdf
    asdfasdf
    asd 123-fasdf $1234.00
    qwe {rq} [we-r_q] we
    Start Invoice Details 
    EOQ


    say $t;
    say invoice.parse($t,:rule<invoice-prelude>);
}

multi sub MAIN('test'){
    use Test;
    ok invoice.parse('Invoice Summary' ~ "\n", rule => <invoice-prelude-start>);

    ok invoice.parse('asdfa {sf} asd-[fasdf] #werwerw'~"\n", rule => <line>);
    ok invoice.parse('asdfawerwerw'~"\n", rule => <line>);

    ok invoice.subparse('fasdff;kjaf asdf asderwret'~"\n"~'Start Invoice Details'~"\n",rule => <invoice-prelude-end>);
    ok invoice.parse('fasdff;kjaf asdf asderwret'~"\n"~'Start Invoice Details'~"\n",rule => <invoice-prelude-end>);
    done-testing;
}
Run Code Online (Sandbox Code Playgroud)

我一直无法弄清楚为什么解析rule <invoice-prelude>失败了Nil.请注意,即使.subparse也失败了.

对于个人令牌的测试都通过,你可以通过运行看MAIN'test'参数(除了ofcourse在.parse<invoice-prelude>,因为它没有完整的字符串失败).

我应该在被修改rule <invoice-prelude>,使整个字符串$tMAIN()可以正确解析?

Håk*_*and 8

请注意,$t字符串中最后一行的末尾有一个隐藏空格:

my $t = q :to/EOQ/; 
    Invoice Summary
    asd fasdf
    asdfasdf
    asd 123-fasdf $1234.00
    qwe {rq} [we-r_q] we
    Start Invoice Details?   <-- Space at the end of the line
    EOQ
Run Code Online (Sandbox Code Playgroud)

这使得<invoice-prelude-end>令牌失败,因为它包含前瞻性正则表达式<?before 'Start Invoice Details'\n>.这个前瞻不包括行尾的可能空格(由于\n前瞻末尾的显式换行符).因此,<invoice-prelude>规则也不匹配.

快速解决方法是删除行尾的空格Start Invoice Details.


wam*_*mba 5

首先,*?没有回溯的节俭量词可能每次都匹配空字符串.你可以用regex而不是rule.

其次,在行的末尾有一个空格,以空格开头Start Invoice Details.

rule invoice-prelude-end {<line> <?before 'Start Invoice Details' \n>};

regex invoice-prelude {
    <invoice-prelude-start>
    <line>*?
    <invoice-prelude-end>
    <line>
}
Run Code Online (Sandbox Code Playgroud)

如果你想避免回溯,你可以使用负向前瞻.

token invoice-prelude-end { <line> };

rule invoice-prelude {
    <invoice-prelude-start>
    [<line> <!before 'Start Invoice Details' \n>]*
    <invoice-prelude-end>
    <line>
}
Run Code Online (Sandbox Code Playgroud)

整个例子以一些变化为灵感:

use v6;
#use Grammar::Tracer;


grammar invoice {
    token ws { <!ww>\h* }
    token super-word { \S+ }
    token line { <super-word>* % <.ws> }

    token invoice-prelude-start   { 'Invoice Summary' }
    rule  invoice-prelude-midline { <line> <!before \n <invoice-details-start> \n> }
    token invoice-prelude-end     { <line> }
    token invoice-details-start   { 'Start Invoice Details' }

    rule invoice-prelude {
        <invoice-prelude-start> \n
        <invoice-prelude-midline> * %% \n
        <invoice-prelude-end> \n
        <invoice-details-start> \n
    }
}

multi sub MAIN(){

    my $t = q :to/EOQ/;
    Invoice Summary
    asd fasdf
    asdfasdf
    asd 123-fasdf $1234.00
    qwe {rq} [we-r_q] we
    Start Invoice Details 
    EOQ


    say $t;
    say invoice.parse($t,:rule<invoice-prelude>);
}
Run Code Online (Sandbox Code Playgroud)