如何在编译时解析DSL的文本？

Question

如何在编译时解析DSL的文本？

是.那就对了.我希望能够粘贴一个表达式,如:

"a && b || c"

Run Code Online (Sandbox Code Playgroud)

直接作为字符串源代码:

const std::string expression_text("a && b || c");

Run Code Online (Sandbox Code Playgroud)

使用它创建一个延迟评估的结构:

Expr expr(magical_function(expression_text));

Run Code Online (Sandbox Code Playgroud)

然后在评估中用已知值代替:

evaluate(expr, a, b, c);

Run Code Online (Sandbox Code Playgroud)

我想稍后扩展这个小DSL,所以使用一些非C++语法做一些更复杂的事情,所以我不能简单地用简单的方法对我的表达式进行硬编码.用例是我能够将另一个开发区域中使用的另一个模块中的相同逻辑复制并粘贴到另一种语言中,而不是每次都要调整它以遵循C++语法.

如果有人能让我开始至少如何做1个表达式的上述简单概念和2个布尔运算符,那将是非常感激的.

注意:由于我发布的另一个问题的反馈,我发布了这个问题:如何解析DSL输入到高性能表达模板.在这里,我实际上想要一个稍微不同的问题的答案,但是评论引发了我认为值得发布的这个具体问题,因为潜在的答案真的值得记录.

Answer 1

llo*_*miz 48

免责声明:我对metaparse一无所知,而且对proto一无所知.以下代码是我尝试(主要通过试验和错误)修改此示例以执行类似于您想要的操作.

代码可以很容易地分成几个部分:

1.语法

1.1 令牌定义

typedef token < lit_c < 'a' > > arg1_token;
typedef token < lit_c < 'b' > > arg2_token;
typedef token < lit_c < 'c' > > arg3_token;

Run Code Online (Sandbox Code Playgroud)

token<Parser>:
token是一个解析器组合器,用于Parser解析输入,然后消耗(并丢弃)所有空格.解析的结果是结果Parser.
lit_c<char>:
lit_c匹配特定char的,解析的结果是相同的char.在语法中,这个结果被使用覆盖了always.

typedef token < keyword < _S ( "true" ), bool_<true> > > true_token;
typedef token < keyword < _S ( "false" ), bool_<false> > > false_token;

Run Code Online (Sandbox Code Playgroud)

keyword<metaparse_string,result_type=undefined>:
keyword匹配特定的metaparse_string(_S("true")返回 metaparse::string<'t','r','u','e'>是metaparse在内部用来做它的魔法),解析的结果是result_type.

typedef token < keyword < _S ( "&&" ) > > and_token;
typedef token < keyword < _S ( "||" ) > > or_token;
typedef token < lit_c < '!' > > not_token;

Run Code Online (Sandbox Code Playgroud)

在的情况下,and_token和or_token结果是不确定的,并在它下面的语法被忽略.

1.2 语法的"规则"

struct paren_exp;

Run Code Online (Sandbox Code Playgroud)

首先paren_exp是前向宣布.

typedef one_of< 
        paren_exp, 
        transform<true_token, build_value>,
        transform<false_token, build_value>, 
        always<arg1_token, arg<0> >,
        always<arg2_token, arg<1> >, 
        always<arg3_token, arg<2> > 
    >
    value_exp;

Run Code Online (Sandbox Code Playgroud)

one_of<Parsers...>:
one_of是一个解析器组合器,它尝试将输入与其中一个参数进行匹配.结果是匹配返回的第一个解析器.
transform<Parser,SemanticAction>:
transform是匹配的解析器组合子Parser.结果类型是Parser转换后的结果类型SemanticAction.

always<Parser,NewResultType>:
匹配Parser,返回NewResultType.

等效的精神规则是:

value_exp = paren_exp [ _val=_1 ]
    | true_token      [ _val=build_value(_1) ]
    | false_token     [ _val=build_value(_1) ]
    | argN_token      [ _val=phx::construct<arg<N>>() ];

Run Code Online (Sandbox Code Playgroud)

typedef one_of< 
        transform<last_of<not_token, value_exp>, build_not>, 
        value_exp
    >
    not_exp;

Run Code Online (Sandbox Code Playgroud)

last_of<Parsers...>:
last_of匹配Parsersin序列中的每一个,其结果类型是最后一个解析器的结果类型.

等效的精神规则是:
```
not_exp = (omit[not_token] >> value_exp) [ _val=build_not(_1) ] 
    | value_exp                          [ _val=_1 ];
```
Run Code Online (Sandbox Code Playgroud)

typedef
foldl_start_with_parser<
        last_of<and_token, not_exp>,
        not_exp,
        build_and
    > and_exp; // and_exp = not_exp >> *(omit[and_token] >> not_exp);

typedef
foldl_start_with_parser<
    last_of<or_token, and_exp>,
    and_exp,
    build_or
> or_exp;     // or_exp = and_exp >> *(omit[or_token] >> and_exp);

Run Code Online (Sandbox Code Playgroud)

foldl_start_with_parser<RepeatingParser,InitialParser,SemanticAction>:
此解析器组合器匹配InitialParser然后RepeatingParser多次匹配,直到失败.结果类型是结果mpl::fold<RepeatingParserSequence, InitialParserResult, SemanticAction>,其中RepeatingParserSequence是每个应用程序的结果类型的序列RepeatingParser.如果RepeatingParser从不成功,结果类型就是简单的InitialParserResult.

我相信(xd)等效的精神规则是:
```
or_exp = and_exp[_a=_1] 
    >> *( omit[or_token] >> and_exp [ _val = build_or(_1,_a), _a = _val ]);  
```
Run Code Online (Sandbox Code Playgroud)

struct paren_exp: middle_of < lit_c < '(' > , or_exp, lit_c < ')' > > {}; 
   // paren_exp = '(' >> or_exp >> ')';

Run Code Online (Sandbox Code Playgroud)

middle_of<Parsers...>:
这匹配序列,Parsers结果类型是中间的解析器的结果.

typedef last_of<repeated<space>, or_exp> expression; 
   //expression = omit[*space] >> or_exp;

Run Code Online (Sandbox Code Playgroud)

repeated<Parser>:
此解析器组合器尝试Parser多次匹配.结果是解析器的每个应用程序的结果类型的序列,如果解析器在第一次尝试时失败,则结果是空序列.此规则只删除任何前导空格.

typedef build_parser<entire_input<expression> > function_parser;

Run Code Online (Sandbox Code Playgroud)

此行创建一个接受输入字符串并返回解析结果的元函数.

2.构建表达式

让我们看一下表达式构建的示例演练.这两个步骤完成:第一语法构造取决于在树上build_or,build_and,build_value,build_not和arg<N>.获得该类型后,您可以使用proto_typetypedef 获取proto表达式.

"a ||!b"

我们开始or_expr:

or_expr:我们尝试使用它的InitialParser and_expr.
- and_expr:我们尝试使用它的InitialParser not_expr.
- - not_expr:not_token失败所以我们尝试value_expr.
- - - value_expr:arg1_token成功.返回类型是arg<0>,我们回去not_expr.
- - not_expr:此步骤不会修改返回类型.我们回去吧and_expr.
- and_expr:我们尝试使用RepeatingParser,它失败了.and_expr成功,其返回类型是其InitialParser的返回类型:arg<0>.我们回去吧or_expr.
- or_expr:我们尝试使用RepeatingParser或or_token匹配and_expr.
- and_expr:我们尝试使用它的InitialParser not_expr.
- - not_expr:not_token成功,我们试试value_expr.
- - - value_expr:arg2_token成功.返回类型是arg<1>,我们回去not_expr.
- - not_expr:使用build_not:build_not :: apply <arg <1 >>通过transform修改返回类型.我们回去吧and_expr.
- and_expr:我们尝试使用RepeatingParser,它失败了.and_expr成功并返回build_not :: apply <arg <1 >>.我们回去吧or_expr.
or_expr:RepeatingParser成功,foldlp使用build_or build_not::apply< arg<1> >和arg<0>,获取build_or::apply< build_not::apply< arg<1> >, arg<0> >.

一旦我们构建了这个树,我们得到它proto_type:

build_or::apply< build_not::apply< arg<1> >, arg<0> >::proto_type;
proto::logical_or< arg<0>::proto_type, build_not::apply< arg<1> >::proto_type >::type;
proto::logical_or< proto::terminal< placeholder<0> >::type, build_not::apply< arg<1> >::proto_type >::type;
proto::logical_or< proto::terminal< placeholder<0> >::type, proto::logical_not< arg<1>::proto_type >::type >::type;
proto::logical_or< proto::terminal< placeholder<0> >::type, proto::logical_not< proto::terminal< placeholder<1> >::type >::type >::type;

Run Code Online (Sandbox Code Playgroud)

完整的示例代码(在Wandbox上运行)

#include <iostream>
#include <vector>

#include <boost/metaparse/repeated.hpp>
#include <boost/metaparse/sequence.hpp>
#include <boost/metaparse/lit_c.hpp>
#include <boost/metaparse/last_of.hpp>
#include <boost/metaparse/middle_of.hpp>
#include <boost/metaparse/space.hpp>
#include <boost/metaparse/foldl_start_with_parser.hpp>
#include <boost/metaparse/one_of.hpp>
#include <boost/metaparse/token.hpp>
#include <boost/metaparse/entire_input.hpp>
#include <boost/metaparse/string.hpp>
#include <boost/metaparse/transform.hpp>
#include <boost/metaparse/always.hpp>
#include <boost/metaparse/build_parser.hpp>
#include <boost/metaparse/keyword.hpp>

#include <boost/mpl/apply_wrap.hpp>
#include <boost/mpl/front.hpp>
#include <boost/mpl/back.hpp>
#include <boost/mpl/bool.hpp>

#include <boost/proto/proto.hpp>
#include <boost/fusion/include/at.hpp>
#include <boost/fusion/include/make_vector.hpp>

using boost::metaparse::sequence;
using boost::metaparse::lit_c;
using boost::metaparse::last_of;
using boost::metaparse::middle_of;
using boost::metaparse::space;
using boost::metaparse::repeated;
using boost::metaparse::build_parser;
using boost::metaparse::foldl_start_with_parser;
using boost::metaparse::one_of;
using boost::metaparse::token;
using boost::metaparse::entire_input;
using boost::metaparse::transform;
using boost::metaparse::always;
using boost::metaparse::keyword;

using boost::mpl::apply_wrap1;
using boost::mpl::front;
using boost::mpl::back;
using boost::mpl::bool_;


struct build_or
{
    typedef build_or type;

    template <class C, class State>
    struct apply
    {
        typedef apply type;
        typedef typename boost::proto::logical_or<typename State::proto_type, typename C::proto_type >::type proto_type;
    };
};

struct build_and
{
    typedef build_and type;

    template <class C, class State>
    struct apply
    {
        typedef apply type;
        typedef typename boost::proto::logical_and<typename State::proto_type, typename C::proto_type >::type proto_type;
    };
};



template<bool I>
struct value //helper struct that will be used during the evaluation in the proto context
{};

struct build_value
{
    typedef build_value type;

    template <class V>
    struct apply
    {
        typedef apply type;
        typedef typename boost::proto::terminal<value<V::type::value> >::type proto_type;
    };
};

struct build_not
{
    typedef build_not type;

    template <class V>
    struct apply
    {
        typedef apply type;
        typedef typename boost::proto::logical_not<typename V::proto_type >::type proto_type;
    };
};

template<int I>
struct placeholder //helper struct that will be used during the evaluation in the proto context
{};

template<int I>
struct arg
{
    typedef arg type;
    typedef typename boost::proto::terminal<placeholder<I> >::type proto_type;
};

#ifdef _S
#error _S already defined
#endif
#define _S BOOST_METAPARSE_STRING

typedef token < keyword < _S ( "&&" ) > > and_token;
typedef token < keyword < _S ( "||" ) > > or_token;
typedef token < lit_c < '!' > > not_token;

typedef token < keyword < _S ( "true" ), bool_<true> > > true_token;
typedef token < keyword < _S ( "false" ), bool_<false> > > false_token;

typedef token < lit_c < 'a' > > arg1_token;
typedef token < lit_c < 'b' > > arg2_token;
typedef token < lit_c < 'c' > > arg3_token;


struct paren_exp;

typedef
one_of< paren_exp, transform<true_token, build_value>, transform<false_token, build_value>, always<arg1_token, arg<0> >, always<arg2_token, arg<1> >, always<arg3_token, arg<2> > >
value_exp; //value_exp = paren_exp | true_token | false_token | arg1_token | arg2_token | arg3_token;

typedef
one_of< transform<last_of<not_token, value_exp>, build_not>, value_exp>
not_exp; //not_exp = (omit[not_token] >> value_exp) | value_exp;

typedef
foldl_start_with_parser <
last_of<and_token, not_exp>,
         not_exp,
         build_and
         >
         and_exp; // and_exp = not_exp >> *(and_token >> not_exp);

typedef
foldl_start_with_parser <
last_of<or_token, and_exp>,
         and_exp,
         build_or
         >
         or_exp; // or_exp = and_exp >> *(or_token >> and_exp);

struct paren_exp: middle_of < lit_c < '(' > , or_exp, lit_c < ')' > > {}; //paren_exp = lit('(') >> or_exp >> lit('(');

typedef last_of<repeated<space>, or_exp> expression; //expression = omit[*space] >> or_exp;

typedef build_parser<entire_input<expression> > function_parser;


template <typename Args>
struct calculator_context
        : boost::proto::callable_context< calculator_context<Args> const >
{
    calculator_context ( const Args& args ) : args_ ( args ) {}
    // Values to replace the placeholders
    const Args& args_;

    // Define the result type of the calculator.
    // (This makes the calculator_context "callable".)
    typedef bool result_type;

    // Handle the placeholders:
    template<int I>
    bool operator() ( boost::proto::tag::terminal, placeholder<I> ) const
    {
        return boost::fusion::at_c<I> ( args_ );
    }

    template<bool I>
    bool operator() ( boost::proto::tag::terminal, value<I> ) const
    {
        return I;
    }
};

template <typename Args>
calculator_context<Args> make_context ( const Args& args )
{
    return calculator_context<Args> ( args );
}

template <typename Expr, typename ... Args>
int evaluate ( const Expr& expr, const Args& ... args )
{
    return boost::proto::eval ( expr, make_context ( boost::fusion::make_vector ( args... ) ) );
}

#ifdef LAMBDA
#error LAMBDA already defined
#endif
#define LAMBDA(exp) apply_wrap1<function_parser, _S(exp)>::type::proto_type{}

int main()
{
    using std::cout;
    using std::endl;

    cout << evaluate ( LAMBDA ( "true&&false" ) ) << endl;
    cout << evaluate ( LAMBDA ( "true&&a" ), false ) << endl;
    cout << evaluate ( LAMBDA ( "true&&a" ), true ) << endl;
    cout << evaluate ( LAMBDA ( "a&&b" ), true, false ) << endl;
    cout << evaluate ( LAMBDA ( "a&&(b||c)" ), true, false, true ) << endl;
    cout << evaluate ( LAMBDA ( "!a&&(false||(b&&!c||false))" ), false, true, false ) << endl;
}

/*int main(int argc , char** argv)
{
    using std::cout;
    using std::endl;

    bool a=false, b=false, c=false;

    if(argc==4)
    {
        a=(argv[1][0]=='1');
        b=(argv[2][0]=='1');
        c=(argv[3][0]=='1');
    }

    LAMBDA("a && b || c") expr;

    cout << evaluate(expr, true, true, false) << endl;
    cout << evaluate(expr, a, b, c) << endl;

    return 0;
}*/

Run Code Online (Sandbox Code Playgroud)

Answer 2

rep*_*vsd 5

长期以来，编译时解析意味着使用模板元编程 - 对于大多数初学者甚至中级 C++ 程序员来说，这似乎是行噪音。

然而，在 C++11 中，我们得到了 constexpr，而在 C++14 中，删除了对 constexpr 的许多限制。C++17 甚至将一些标准库的东西变成了 constexpr。

在尝试学习高级现代 C++ 时 - 我决定编写一个编译时 HTML 解析器 - 这个想法是创建一个快速的 HTML 模板引擎。

完整代码可以在这里找到： https: //github.com/rep-movsd/see-phit

我将简要解释一下我在让它发挥作用时学到的东西。

处理动态数据结构

我需要解析 const char* 并将其转换为多路树 - 但动态分配在 constexpr 领域是禁忌。

解决方案？使用节点数组，索引指向子节点和兄弟节点 - 本质上就是您在 FORTRAN 中的做法！

需要注意的是，您的节点列表最初必须具有固定大小。保持它非常大似乎会使 gcc 大大减慢编译速度。如果您最终超出了数组的末尾，编译器将抛出错误。我写了一个像 std::array 一样的小包装器，它完全是 constexpr。

解析

您为运行时解析编写的几乎标准代码将在编译时运行！循环、递归、条件——一切都完美运行。

一个问题是——如何表示字符串？使用上面提到的方法（字符数组）是非常消耗内存的、乏味的做事方式。幸运的是，就我而言，我所需要的只是原始 const char* 输入的子字符串。所以我写了一个类似 constexpr string_view 的小类，它只保存指向相关解析标记的开头和结尾的指针。创建新的文字字符串只需将这些视图转换为 const char* 文字即可。

错误报告

处理 constexpr 代码中的错误的基本方法是调用不是 constexpr 的函数 - 编译器停止并打印有问题的行，该行很容易包含错误字符串。

然而我想要更多 - 我希望解析器也显示行和列。我挣扎了一会儿，最后觉得这是不可能的。但我又回去尝试了所有我能想到的方法。最后我找到了一种方法，可以让 gcc 打印 2 个数字和一条错误描述消息。它本质上涉及创建一个具有两个整数参数（行和列）的模板，其值来自 constexpr 解析器。

表现

我无法清楚地找到任何关于哪种 constexpr 代码往往会减慢编译器速度的模式，但默认性能一点也不差。我能够在 gcc 上大约 1.5 秒内解析 1000 个节点的 HTML 文件。

clang 有点快。

我打算在 github 存储库的 wiki 中编写有关代码如何工作的更详细描述 - 请继续关注。

归档时间：	12 年，10 月前
查看次数：	7720 次
最近记录：	8 年，9 月前