尽管令牌不完整,但提升精神意味着成功解析

Cra*_*ght 4 c++ boost boost-spirit boost-spirit-lex

我有一个非常简单的路径构造,我试图用boost spirit.lex解析.

我们有以下语法:

token := [a-z]+
path := (token : path) | (token)
Run Code Online (Sandbox Code Playgroud)

所以我们这里只讨论冒号分隔的小写ASCII字符串.

我有三个例子"xyz","abc:xyz","abc:xyz:".

前两个应被视为有效.第三个,有一个尾随结肠,不应被视为有效.不幸的是,解析器我已经认识到这三个都是有效的.语法不应该允许空令牌,但显然精神就是这样做的.为了让第三个被拒绝,我错过了什么?

此外,如果您阅读下面的代码,则在注释中还有另一个版本的解析器要求所有路径以分号结尾.当我激活那些线时,我可以得到适当的行为(即拒绝"abc:xyz:;"),但这不是我想要的.

有人有主意吗?

谢谢.

#include <boost/config/warning_disable.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/lex_lexertl.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>

#include <iostream>
#include <string>

using namespace boost::spirit;
using boost::phoenix::val;

template<typename Lexer>
struct PathTokens : boost::spirit::lex::lexer<Lexer>
{
      PathTokens()
      {
         identifier = "[a-z]+";
         separator = ":";

         this->self.add
            (identifier)
            (separator)
            (';')
            ;
      }
      boost::spirit::lex::token_def<std::string> identifier, separator;
};


template <typename Iterator>
struct PathGrammar 
   : boost::spirit::qi::grammar<Iterator> 
{
      template <typename TokenDef>
      PathGrammar(TokenDef const& tok)
         : PathGrammar::base_type(path)
      {
         using boost::spirit::_val;
         path
            = 
            (token >> tok.separator >> path)[std::cerr << _1 << "\n"]
            |
            //(token >> ';')[std::cerr << _1 << "\n"]
            (token)[std::cerr << _1 << "\n"]
             ; 

          token 
             = (tok.identifier) [_val=_1]
          ;

      }
      boost::spirit::qi::rule<Iterator> path;
      boost::spirit::qi::rule<Iterator, std::string()> token;
};


int main()
{
   typedef std::string::iterator BaseIteratorType;
   typedef boost::spirit::lex::lexertl::token<BaseIteratorType, boost::mpl::vector<std::string> > TokenType;
   typedef boost::spirit::lex::lexertl::lexer<TokenType> LexerType;
   typedef PathTokens<LexerType>::iterator_type TokensIterator;
   typedef std::vector<std::string> Tests;

   Tests paths;
   paths.push_back("abc");
   paths.push_back("abc:xyz");
   paths.push_back("abc:xyz:");
   /*
     paths.clear();
     paths.push_back("abc;");
     paths.push_back("abc:xyz;");
     paths.push_back("abc:xyz:;");
   */
   for ( Tests::iterator iter = paths.begin(); iter != paths.end(); ++iter )
   {
      std::string str = *iter;
      std::cerr << "*****" << str << "*****\n";

      PathTokens<LexerType> tokens;
      PathGrammar<TokensIterator> grammar(tokens);

      BaseIteratorType first = str.begin();
      BaseIteratorType last = str.end();

      bool r = boost::spirit::lex::tokenize_and_parse(first, last, tokens, grammar);

      std::cerr << r << " " << (first==last) << "\n";
   }
}
Run Code Online (Sandbox Code Playgroud)

seh*_*ehe 5

除了llonesmiz已经说过的内容之外,这里有一个技巧qi::eoi,我有时使用它:

path = (
           (token >> tok.separator >> path) [std::cerr << _1 << "\n"]
         | token                           [std::cerr << _1 << "\n"]
    ) >> eoi;
Run Code Online (Sandbox Code Playgroud)

这使得语法在成功匹配结束时需要 eoi(输入结束).这导致了期望的结果:

http://liveworkspace.org/code/23a7adb11889bbb2825097d7c553f71d

*****abc*****
abc
1 1
*****abc:xyz*****
xyz
abc
1 1
*****abc:xyz:*****
xyz
abc
0 1
Run Code Online (Sandbox Code Playgroud)