Rav*_*ell 6 regex grammar parsing perl6 raku
我有以下代码:https://gist.github.com/ravbell/d94b37f1a346a1f73b5a827d9eaf7c92
use v6;
#use Grammar::Tracer;
grammar invoice {
token ws { \h*};
token super-word {\S+};
token super-phrase { <super-word> [\h <super-word>]*}
token line {^^ \h* [ <super-word> \h+]* <super-word>* \n};
token invoice-prelude-start {^^'Invoice Summary'\n}
token invoice-prelude-end {<line> <?before 'Start Invoice Details'\n>};
rule invoice-prelude {
<invoice-prelude-start>
<line>*?
<invoice-prelude-end>
<line>
}
}
multi sub MAIN(){
my $t = q :to/EOQ/;
Invoice Summary
asd fasdf
asdfasdf
asd 123-fasdf $1234.00
qwe {rq} [we-r_q] we
Start Invoice Details
EOQ
say $t;
say invoice.parse($t,:rule<invoice-prelude>);
}
multi sub MAIN('test'){
use Test;
ok invoice.parse('Invoice Summary' ~ "\n", rule => <invoice-prelude-start>);
ok invoice.parse('asdfa {sf} asd-[fasdf] #werwerw'~"\n", rule => <line>);
ok invoice.parse('asdfawerwerw'~"\n", rule => <line>);
ok invoice.subparse('fasdff;kjaf asdf asderwret'~"\n"~'Start Invoice Details'~"\n",rule => <invoice-prelude-end>);
ok invoice.parse('fasdff;kjaf asdf asderwret'~"\n"~'Start Invoice Details'~"\n",rule => <invoice-prelude-end>);
done-testing;
}
Run Code Online (Sandbox Code Playgroud)
我一直无法弄清楚为什么解析rule <invoice-prelude>
失败了Nil
.请注意,即使.subparse
也失败了.
对于个人令牌的测试都通过,你可以通过运行看MAIN
与'test'
参数(除了ofcourse在.parse
上<invoice-prelude>
,因为它没有完整的字符串失败).
我应该在被修改rule <invoice-prelude>
,使整个字符串$t
中MAIN()
可以正确解析?
请注意,$t
字符串中最后一行的末尾有一个隐藏空格:
my $t = q :to/EOQ/;
Invoice Summary
asd fasdf
asdfasdf
asd 123-fasdf $1234.00
qwe {rq} [we-r_q] we
Start Invoice Details? <-- Space at the end of the line
EOQ
Run Code Online (Sandbox Code Playgroud)
这使得<invoice-prelude-end>
令牌失败,因为它包含前瞻性正则表达式<?before 'Start Invoice Details'\n>
.这个前瞻不包括行尾的可能空格(由于\n
前瞻末尾的显式换行符).因此,<invoice-prelude>
规则也不匹配.
快速解决方法是删除行尾的空格Start Invoice Details
.
首先,*?
没有回溯的节俭量词可能每次都匹配空字符串.你可以用regex
而不是rule
.
其次,在行的末尾有一个空格,以空格开头Start Invoice Details
.
rule invoice-prelude-end {<line> <?before 'Start Invoice Details' \n>};
regex invoice-prelude {
<invoice-prelude-start>
<line>*?
<invoice-prelude-end>
<line>
}
Run Code Online (Sandbox Code Playgroud)
如果你想避免回溯,你可以使用负向前瞻.
token invoice-prelude-end { <line> };
rule invoice-prelude {
<invoice-prelude-start>
[<line> <!before 'Start Invoice Details' \n>]*
<invoice-prelude-end>
<line>
}
Run Code Online (Sandbox Code Playgroud)
整个例子以一些变化为灵感:
use v6;
#use Grammar::Tracer;
grammar invoice {
token ws { <!ww>\h* }
token super-word { \S+ }
token line { <super-word>* % <.ws> }
token invoice-prelude-start { 'Invoice Summary' }
rule invoice-prelude-midline { <line> <!before \n <invoice-details-start> \n> }
token invoice-prelude-end { <line> }
token invoice-details-start { 'Start Invoice Details' }
rule invoice-prelude {
<invoice-prelude-start> \n
<invoice-prelude-midline> * %% \n
<invoice-prelude-end> \n
<invoice-details-start> \n
}
}
multi sub MAIN(){
my $t = q :to/EOQ/;
Invoice Summary
asd fasdf
asdfasdf
asd 123-fasdf $1234.00
qwe {rq} [we-r_q] we
Start Invoice Details
EOQ
say $t;
say invoice.parse($t,:rule<invoice-prelude>);
}
Run Code Online (Sandbox Code Playgroud)