使用Boost.Spirit.Lex和流迭代器

Ale*_* B. 5 c++ boost boost-spirit c++11

我想用Boost.Spirit.Lex来表示二进制文件; 为此我编写了以下程序(这是一个摘录):

#include <boost/spirit/include/lex_lexertl.hpp>
#include <boost/spirit/include/support_multi_pass.hpp>
#include <boost/bind.hpp>
#include <boost/ref.hpp>
#include <fstream>
#include <iterator>
#include <string>

namespace spirit = boost::spirit;
namespace lex = spirit::lex;

#define X 1
#define Y 2
#define Z 3

template<typename L>
class word_count_tokens : public lex::lexer<L>
{
    public:
        word_count_tokens () {
            this->self.add
                ("[^ \t\n]+", X)
                ("\n", Y)
                (".", Z);
        }
};

class counter
{
    public:
        typedef bool result_type;

        template<typename T>
        bool operator () (const T &t, size_t &c, size_t &w, size_t &l) const {
            switch (t.id ()) {
               case X:
                   ++w; c += t.value ().size ();
                    break;
               case Y:
                   ++l; ++c;
                    break;
                case Z:
                    ++c;
                    break;
            }

            return true;
        }
};

int main (int argc, char **argv)
{
    std::ifstream ifs (argv[1], std::ios::in | std::ios::binary);
    auto first = spirit::make_default_multi_pass (std::istream_iterator<char> (ifs));
    auto last = spirit::make_default_multi_pass (std::istream_iterator<char> ());
    size_t w, c, l;
    word_count_tokens<lex::lexertl::lexer<>> word_count_functor;

    w = c = l = 0;

    bool r = lex::tokenize (first, last, word_count_functor, boost::bind (counter (), _1, boost::ref (c), boost::ref (w), boost::ref (l)));

    ifs.close ();

    if (r) {
        std::cout << l << ", " << w << ", " << c << std::endl;
    }

    return 0;
}
Run Code Online (Sandbox Code Playgroud)

构建返回以下错误:

lexer.hpp:390:46: error: non-const lvalue reference to type 'const char *' cannot bind to a value of unrelated type
Run Code Online (Sandbox Code Playgroud)

现在,错误是由于定义具体的词法分析,lex::lexer<>; 实际上它的第一个参数是默认的const char *.如果我使用spirit::istream_iterator或,我也会得到同样的错误spirit::make_default_multi_pass (.....).
但是,如果我指定正确的模板参数,lex::lexer<>我会获得大量的错误!

解决方案?

更新

我已经推出了所有源文件; 这是word_counter网站的例子.

seh*_*ehe 2

好的,既然问题改变了,这里有一个新的答案,用完整的代码示例解决了一些问题。

  1. 首先,您需要使用自定义令牌类型。IE

    word_count_tokens<lex::lexertl::lexer<lex::lexertl::token<boost::spirit::istream_iterator>>> word_count_functor;
    // instead of:
    // word_count_tokens<lex::lexertl::lexer<>> word_count_functor;
    
    Run Code Online (Sandbox Code Playgroud)

    显然,习惯上 typedeflex::lexertl::token<boost::spirit::istream_iterator>

  2. 您需要使用min_token_id令牌 ID 1、2、3 来代替。另外,将其设为枚举以便于维护:

    enum token_ids {
        X = lex::min_token_id + 1,
        Y,
        Z,
    };
    
    Run Code Online (Sandbox Code Playgroud)
  3. 您不能再仅使用.size()默认令牌,value()因为迭代器范围不再是 RandomAccessRange。相反,使用boost::distance()专门用于iterator_range

            ++w; c += boost::distance(t.value()); // t.value ().size ();
    
    Run Code Online (Sandbox Code Playgroud)

结合这些修复:Live On Coliru

#include <boost/spirit/include/lex_lexertl.hpp>
#include <boost/spirit/include/support_istream_iterator.hpp>
#include <boost/bind.hpp>
#include <fstream>

namespace spirit = boost::spirit;
namespace lex    = spirit::lex;

enum token_ids {
    X = lex::min_token_id + 1,
    Y,
    Z,
};

template<typename L>
class word_count_tokens : public lex::lexer<L>
{
    public:
        word_count_tokens () {
            this->self.add
                ("[^ \t\n]+", X)
                ("\n"       , Y)
                ("."        , Z);
        }
};

struct counter
{
    typedef bool result_type;

    template<typename T>
    bool operator () (const T &t, size_t &c, size_t &w, size_t &l) const {
        switch (t.id ()) {
            case X:
                ++w; c += boost::distance(t.value()); // t.value ().size ();
                break;
            case Y:
                ++l; ++c;
                break;
            case Z:
                ++c;
                break;
        }

        return true;
    }
};

int main (int argc, char **argv)
{
    std::ifstream ifs (argv[1], std::ios::in | std::ios::binary);
    ifs >> std::noskipws;
    boost::spirit::istream_iterator first(ifs), last;
    word_count_tokens<lex::lexertl::lexer<lex::lexertl::token<boost::spirit::istream_iterator>>> word_count_functor;

    size_t w = 0, c = 0, l = 0;
    bool r = lex::tokenize (first, last, word_count_functor, 
            boost::bind (counter (), _1, boost::ref (c), boost::ref (w), boost::ref (l)));

    ifs.close ();

    if (r) {
        std::cout << l << ", " << w << ", " << c << std::endl;
    }
}
Run Code Online (Sandbox Code Playgroud)

当自身运行时,打印

65, 183, 1665
Run Code Online (Sandbox Code Playgroud)