使用语法解析可能嵌套的支撑项

Håk*_*and 4 grammar perl6 raku

我开始编写BibTeX解析器.我想做的第一件事是解析一个支撑项目.例如,支撑项可以是作者字段或标题.字段中可能有嵌套的大括号.下面的代码并没有处理嵌套括号:

use v6;

my $str = q:to/END/;
  author={Belayneh, M. and Geiger, S. and Matth{\"{a}}i, S.K.}, 
  END

$str .= chomp;

grammar ExtractBraced {
    rule TOP {
        'author=' <braced-item> .*
    }
    rule braced-item      { '{' <-[}]>* '}' }
}

ExtractBraced.parse( $str ).say;
Run Code Online (Sandbox Code Playgroud)

输出:

?author={Belayneh, M. and Geiger, S. and Matth{\"{a}}i, S.K.},?
 braced-item => ?{Belayneh, M. and Geiger, S. and Matth{\"{a}?
Run Code Online (Sandbox Code Playgroud)

现在,为了使解析器接受嵌套大括号,我想保留当前解析的开括号数量的计数器,当遇到右大括号时,我们减少计数器.如果计数器达到零,我们假设我们已经解析了完整的项目.

为了遵循这个想法,我尝试拆分braced-item正则表达式,对每个char实现语法操作.(braced-item-char下面正则表达式的操作方法应该处理大括号计数器):

grammar ExtractBraced {
    rule TOP {
        'author=' <braced-item> .*
    }
    rule braced-item      { '{' <braced-item-char>* '}' }
    rule braced-item-char { <-[}]> }
}
Run Code Online (Sandbox Code Playgroud)

但是,现在突然解析失败了.可能是一个愚蠢的错误,但我不明白为什么它现在应该失败?

Bra*_*ert 6

如果不知道你想要的结果数据,我会改变它看起来像这样:

my $str = ?author={Belayneh, M. and Geiger, S. and Matth{\"{a}}i, S.K.},?;

grammar ExtractBraced {
    token TOP {
        'author='
        $<author> = <.braced-item>
        .*
    }
    token braced-item {
       '{' ~ '}'

           [
           || <- [{}] >+
           || <.before '{'> <.braced-item>
           ]*
    }
}

ExtractBraced.parse( $str ).say;
Run Code Online (Sandbox Code Playgroud)
?author={Belayneh, M. and Geiger, S. and Matth{\"{a}}i, S.K.},?
 author => ?{Belayneh, M. and Geiger, S. and Matth{\"{a}}i, S.K.}?
Run Code Online (Sandbox Code Playgroud)

如果你想要更多的结构它可能看起来更像这样:

my $str = ?author={Belayneh, M. and Geiger, S. and Matth{\"{a}}i, S.K.},?;

grammar ExtractBraced {
    token TOP {
        'author='
        $<author> = <.braced-item>
        .*
    }
    token braced-part {
        || <- [{}] >+
        || <.before '{'> <braced-item>
    }
    token braced-item {
        '{' ~ '}'
            <braced-part>*
    }
}

class Print {
    method TOP ($/){
        make $<author>.made
    }
    method braced-part ($/){
        make $<braced-item>.?made // ~$/
    }
    method braced-item ($/){
        make [~] @<braced-part>».made
    }
}


my $r = ExtractBraced.parse( $str, :actions(Print) );
say $r;
put();
say $r.made;
Run Code Online (Sandbox Code Playgroud)
?author={Belayneh, M. and Geiger, S. and Matth{\"{a}}i, S.K.},?
 author => ?{Belayneh, M. and Geiger, S. and Matth{\"{a}}i, S.K.}?
  braced-part => ?Belayneh, M. and Geiger, S. and Matth?
  braced-part => ?{\"{a}}?
   braced-item => ?{\"{a}}?
    braced-part => ?\"?
    braced-part => ?{a}?
     braced-item => ?{a}?
      braced-part => ?a?
  braced-part => ?i, S.K.?

Belayneh, M. and Geiger, S. and Matth\"ai, S.K.
Run Code Online (Sandbox Code Playgroud)

请注意,+on <-[{}]>+是一个优化,并且<before '{'>两者都可以省略,它仍然可以工作.