Spirit X3,这种错误处理方法有用吗?

Zey*_*neb 2 c++ boost-spirit boost-spirit-x3

在阅读了 Spirit X3 关于错误处理的教程和一些实验之后。我得出了一个结论。

\n\n

我相信 X3 中的错误处理主题还有一些改进的空间。从我的角度来看,一个重要的目标是提供有意义的错误消息。首先也是最重要的是添加一个将_pass(ctx)成员设置为 false 的语义操作,\xe2\x80\x99 不会这样做,因为 X3 会尝试匹配其他内容。仅抛出 anx3::expectation_failure会提前退出解析函数,即不尝试匹配其他任何内容。所以剩下的就是解析器指令expect[a]和解析器operator>以及手动抛出x3::expectation_failure从语义操作中手动抛出。我确实相信有关此错误处理的词汇量太有限。请考虑以下 X3 PEG 语法行:

\n\n
const auto a = a1 >> a2 >> a3;\nconst auto b = b1 >> b2 >> b3;\nconst auto c = c1 >> c2 >> c3;\n\nconst auto main_rule__def =\n(\n a |\n b |\n c );\n
Run Code Online (Sandbox Code Playgroud)\n\n

现在对于表达式a我不能使用expect[]or operator>,因为其他替代方案可能是有效的。我可能是错的,但我认为 X3 要求我拼写出可以匹配的备用错误表达式,如果它们匹配,它们可以抛出x3::expectation_failure这很麻烦。

\n\n

问题是,是否有一种好方法可以使用当前的 X3 设施通过 a、b 和 c 的有序替代项检查 PEG 构造中的错误条件?

\n\n

如果答案是否定的,我想提出我的想法,为此提供一个合理的解决方案。我相信我需要一个新的解析器指令。该指令应该做什么?当解析失败时,它应该调用附加的语义操作。该属性显然未使用,但我需要_where在第一次出现解析不匹配时在迭代器位置上设置该成员。所以如果a2失败,_where应该在结束后设置1 a1。让\xe2\x80\x99s调用解析指令neg_sa。这意味着否定语义动作。

\n\n

pseudocode

\n\n
// semantic actions\nauto a_sa = [&](auto& ctx)\n{\n  // add _where to vector v\n};\n\nauto b_sa = [&](auto& ctx)\n{\n  // add _where to vector v\n};\n\nauto c_sa = [&](auto& ctx)\n{\n  // add _where to vector v\n\n  // now we know we have a *real* error.\n  // find the peak iterator value in the vector v\n  // the position tells whether it belongs to a, b or c.\n  // now we can formulate an error message like: \xe2\x80\x9ccannot make sense of b upto this position.\xe2\x80\x9d\n  // lastly throw x3::expectation_failure\n};\n\n// PEG\nconst auto a = a1 >> a2 >> a3;\nconst auto b = b1 >> b2 >> b3;\nconst auto c = c1 >> c2 >> c3;\n\nconst auto main_rule__def =\n(\n neg_sa[a][a_sa] |\n neg_sa[b][b_sa] |\n neg_sa[c][c_sa] );\n
Run Code Online (Sandbox Code Playgroud)\n\n

我希望我清楚地表达了这个想法。如果我需要进一步解释,请在评论部分告诉我。

\n

seh*_*ehe 7

Okay, risking conflating too many things in an example, here goes:

\n\n
namespace square::peg {\n    using namespace x3;\n\n    const auto quoted_string = lexeme[\'"\' > *(print - \'"\') > \'"\'];\n    const auto bare_string   = lexeme[alpha > *alnum] > \';\';\n    const auto two_ints      = int_ > int_;\n\n    const auto main          = quoted_string | bare_string | two_ints;\n\n    const auto entry_point   = skip(space)[ expect[main] > eoi ];\n} // namespace square::peg\n
Run Code Online (Sandbox Code Playgroud)\n\n

That should do. The key is that the only things that should be expectation\npoints is things that make the respective branch fail BEYOND the point where it\nwas unambiguously the right branch. (Otherwise, there would literally not be a\nhard expectation).

\n\n

With two minor get_info specialization for prettier messages\xc2\xb9, this could lead\nto decent error messages even when manually catching the exception:

\n\n

Live On Coliru

\n\n
int main() {\n    using It = std::string::const_iterator;\n\n    for (std::string const input : {\n            "   -89 0038  ",\n            "   \\"-89 0038\\"  ",\n            "   something123123      ;",\n            // undecidable\n            "",\n            // violate expecations, no successful parse\n            "   -89 oops  ",   // not an integer\n            "   \\"-89 0038  ", // missing "\n            "   bareword ",    // missing ;\n            // trailing debris, successful "main"\n            "   -89 3.14  ",   // followed by .14\n        })\n    {\n        std::cout << "====== " << std::quoted(input) << "\\n";\n\n        It iter = input.begin(), end = input.end();\n        try {\n        if (parse(iter, end, square::peg::entry_point)) {\n            std::cout << "Parsed successfully\\n";\n        } else {\n            std::cout << "Parsing failed\\n";\n        }\n        } catch (x3::expectation_failure<It> const& ef) {\n            auto pos = std::distance(input.begin(), ef.where());\n            std::cout << "Expect " << ef.which() << " at "\n                << "\\n\\t" << input\n                << "\\n\\t" << std::setw(pos) << std::setfill(\'-\') << "" << "^\\n";\n        }\n    }\n}\n
Run Code Online (Sandbox Code Playgroud)\n\n

Prints

\n\n
====== "   -89 0038  "\nParsed successfully\n====== "   \\"-89 0038\\"  "\nParsed successfully\n====== "   something123123      ;"\nParsed successfully\n====== ""\nExpect quoted string, bare string or integer number pair at\n\n    ^\n====== "   -89 oops  "\nExpect integral number at\n       -89 oops \n    -------^\n====== "   \\"-89 0038  "\nExpect \'"\' at\n       "-89 0038 \n    --------------^\n====== "   bareword "\nExpect \';\' at\n       bareword\n    ------------^\n====== "   -89 3.14  "\nExpect eoi at\n       -89 3.14 \n    --------^\n
Run Code Online (Sandbox Code Playgroud)\n\n

This is already beyond what most people expect from their parsers.

\n\n

But: Automate That, Also, More Flexible

\n\n

We might not be content reporting just the one expectation and bailing out. Indeed, you can report and continue parsing as there were just a regular mismatch: this is where on_error comes in.

\n\n

Let\'s create a tag base:

\n\n
struct with_error_handling {\n    template<typename It, typename Ctx>\n        x3::error_handler_result on_error(It f, It l, expectation_failure<It> const& ef, Ctx const&) const {\n            std::string s(f,l);\n            auto pos = std::distance(f, ef.where());\n\n            std::cout << "Expecting " << ef.which() << " at "\n                << "\\n\\t" << s\n                << "\\n\\t" << std::setw(pos) << std::setfill(\'-\') << "" << "^\\n";\n\n            return error_handler_result::fail;\n        }\n};\n
Run Code Online (Sandbox Code Playgroud)\n\n

Now, all we have to do is derive our rule ID from with_error_handlingand BAM!, we don\'t have to write any exception handlers, rules will simply "fail" with the appropriate diagnostics. What\'s more, some inputs can lead to multiple (hopefully helpful) diagnostics:

\n\n
auto const eh = [](auto p) {\n    struct _ : with_error_handling {};\n    return rule<_> {} = p;\n};\n\nconst auto quoted_string = eh(lexeme[\'"\' > *(print - \'"\') > \'"\']);\nconst auto bare_string   = eh(lexeme[alpha > *alnum] > \';\');\nconst auto two_ints      = eh(int_ > int_);\n\nconst auto main          = quoted_string | bare_string | two_ints;\nusing main_type = std::remove_cv_t<decltype(main)>;\n\nconst auto entry_point   = skip(space)[ eh(expect[main] > eoi) ];\n
Run Code Online (Sandbox Code Playgroud)\n\n

Now, main becomes just:

\n\n

Live On Coliru

\n\n
for (std::string const input : { \n        "   -89 0038  ",\n        "   \\"-89 0038\\"  ",\n        "   something123123      ;",\n        // undecidable\n        "",\n        // violate expecations, no successful parse\n        "   -89 oops  ",   // not an integer\n        "   \\"-89 0038  ", // missing "\n        "   bareword ",    // missing ;\n        // trailing debris, successful "main"\n        "   -89 3.14  ",   // followed by .14\n    })\n{\n    std::cout << "====== " << std::quoted(input) << "\\n";\n\n    It iter = input.begin(), end = input.end();\n    if (parse(iter, end, square::peg::entry_point)) {\n        std::cout << "Parsed successfully\\n";\n    } else {\n        std::cout << "Parsing failed\\n";\n    }\n}\n
Run Code Online (Sandbox Code Playgroud)\n\n

And the program prints:

\n\n
====== "   -89 0038  "\nParsed successfully\n====== "   \\"-89 0038\\"  "\nParsed successfully\n====== "   something123123      ;"\nParsed successfully\n====== ""\nExpecting quoted string, bare string or integer number pair at \n\n    ^\nParsing failed\n====== "   -89 oops  "\nExpecting integral number at \n       -89 oops  \n    -------^\nExpecting quoted string, bare string or integer number pair at \n       -89 oops  \n    ^\nParsing failed\n====== "   \\"-89 0038  "\nExpecting \'"\' at \n       "-89 0038  \n    --------------^\nExpecting quoted string, bare string or integer number pair at \n       "-89 0038  \n    ^\nParsing failed\n====== "   bareword "\nExpecting \';\' at \n       bareword \n    ------------^\nExpecting quoted string, bare string or integer number pair at \n       bareword \n    ^\nParsing failed\n====== "   -89 3.14  "\nExpecting eoi at \n       -89 3.14  \n    --------^\nParsing failed\n
Run Code Online (Sandbox Code Playgroud)\n\n

Attribute Propagation, on_success

\n\n

Parsers aren\'t very useful when they don\'t actually parse anything, so let\'s add some constructive value handling, also showcaseing on_success:

\n\n

Defining some AST types to receive the attributes:

\n\n
struct quoted : std::string {};\nstruct bare   : std::string {};\nusing  two_i  = std::pair<int, int>;\nusing Value = boost::variant<quoted, bare, two_i>;\n
Run Code Online (Sandbox Code Playgroud)\n\n

Make sure we can print Values:

\n\n
static inline std::ostream& operator<<(std::ostream& os, Value const& v) {\n    struct {\n        std::ostream& _os;\n        void operator()(quoted const& v) const { _os << "quoted(" << std::quoted(v) << ")";             } \n        void operator()(bare const& v) const   { _os << "bare(" << v << ")";                            } \n        void operator()(two_i const& v) const  { _os << "two_i(" << v.first << ", " << v.second << ")"; } \n    } vis{os};\n\n    boost::apply_visitor(vis, v);\n    return os;\n}\n
Run Code Online (Sandbox Code Playgroud)\n\n

Now, use the old as<> trick to coerce attribute types, this time with error-handling:

\n\n

作为锦上添花,让我们来演示on_success一下with_error_handling

\n\n
    template<typename It, typename Ctx>\n        void on_success(It f, It l, two_i const& v, Ctx const&) const {\n            std::cout << "Parsed " << std::quoted(std::string(f,l)) << " as integer pair " << v.first << ", " << v.second << "\\n";\n        }\n
Run Code Online (Sandbox Code Playgroud)\n\n

现在主程序基本上没有修改(也只打印结果值):

\n\n

Live On Coliru

\n\n
    It iter = input.begin(), end = input.end();\n    Value v;\n    if (parse(iter, end, square::peg::entry_point, v)) {\n        std::cout << "Result value: " << v << "\\n";\n    } else {\n        std::cout << "Parsing failed\\n";\n    }\n
Run Code Online (Sandbox Code Playgroud)\n\n

印刷

\n\n
====== "   -89 0038  "\nParsed "-89 0038" as integer pair -89, 38\nResult value: two_i(-89, 38)\n====== "   \\"-89 0038\\"  "\nResult value: quoted("-89 0038")\n====== "   something123123      ;"\nResult value: bare(something123123)\n====== ""\nExpecting quoted string, bare string or integer number pair at \n\n    ^\nParsing failed\n====== "   -89 oops  "\nExpecting integral number at \n       -89 oops  \n    -------^\nExpecting quoted string, bare string or integer number pair at \n       -89 oops  \n    ^\nParsing failed\n====== "   \\"-89 0038  "\nExpecting \'"\' at \n       "-89 0038  \n    --------------^\nExpecting quoted string, bare string or integer number pair at \n       "-89 0038  \n    ^\nParsing failed\n====== "   bareword "\nExpecting \';\' at \n       bareword \n    ------------^\nExpecting quoted string, bare string or integer number pair at \n       bareword \n    ^\nParsing failed\n====== "   -89 3.14  "\nParsed "-89 3" as integer pair -89, 3\nExpecting eoi at \n       -89 3.14  \n    --------^\nParsing failed\n
Run Code Online (Sandbox Code Playgroud)\n\n

确实做得太过分了

\n\n

我不了解你,但我讨厌做副作用,更不用说从解析器打印到控制台了。让我们使用x3::with instead.

\n\n

我们希望通过参数附加到诊断,Ctx&而不是写入\ std::coutnon_error handler:

\n\n
struct with_error_handling {\n    struct diags;\n\n    template<typename It, typename Ctx>\n        x3::error_handler_result on_error(It f, It l, expectation_failure<It> const& ef, Ctx const& ctx) const {\n            std::string s(f,l);\n            auto pos = std::distance(f, ef.where());\n\n            std::ostringstream oss;\n            oss << "Expecting " << ef.which() << " at "\n                << "\\n\\t" << s\n                << "\\n\\t" << std::setw(pos) << std::setfill(\'-\') << "" << "^";\n\n            x3::get<diags>(ctx).push_back(oss.str());\n\n            return error_handler_result::fail;\n        }\n};\n
Run Code Online (Sandbox Code Playgroud)\n\n

在调用站点上,我们可以传递上下文:

\n\n
std::vector<std::string> diags;\n\nif (parse(iter, end, x3::with<D>(diags) [square::peg::entry_point], v)) {\n    std::cout << "Result value: " << v;\n} else {\n    std::cout << "Parsing failed";\n}\n\nstd::cout << " with " << diags.size() << " diagnostics messages: \\n";\n
Run Code Online (Sandbox Code Playgroud)\n\n

完整的程序还打印诊断信息:

\n\n

Live On Wandbox\xc2\xb2

\n\n

完整列表

\n\n
//#define BOOST_SPIRIT_X3_DEBUG\n#include <boost/fusion/adapted.hpp>\n#include <boost/spirit/home/x3.hpp>\n#include <iostream>\n#include <iomanip>\n\nnamespace x3 = boost::spirit::x3;\n\nstruct quoted : std::string {};\nstruct bare   : std::string {};\nusing  two_i  = std::pair<int, int>;\nusing Value = boost::variant<quoted, bare, two_i>;\n\nstatic inline std::ostream& operator<<(std::ostream& os, Value const& v) {\n    struct {\n        std::ostream& _os;\n        void operator()(quoted const& v) const { _os << "quoted(" << std::quoted(v) << ")";             } \n        void operator()(bare const& v) const   { _os << "bare(" << v << ")";                            } \n        void operator()(two_i const& v) const  { _os << "two_i(" << v.first << ", " << v.second << ")"; } \n    } vis{os};\n\n    boost::apply_visitor(vis, v);\n    return os;\n}\n\nnamespace square::peg {\n    using namespace x3;\n\n    struct with_error_handling {\n        struct diags;\n\n        template<typename It, typename Ctx>\n            x3::error_handler_result on_error(It f, It l, expectation_failure<It> const& ef, Ctx const& ctx) const {\n                std::string s(f,l);\n                auto pos = std::distance(f, ef.where());\n\n                std::ostringstream oss;\n                oss << "Expecting " << ef.which() << " at "\n                    << "\\n\\t" << s\n                    << "\\n\\t" << std::setw(pos) << std::setfill(\'-\') << "" << "^";\n\n                x3::get<diags>(ctx).push_back(oss.str());\n\n                return error_handler_result::fail;\n            }\n    };\n\n    template <typename T = x3::unused_type> auto const as = [](auto p) {\n        struct _ : with_error_handling {};\n        return rule<_, T> {} = p;\n    };\n\n    const auto quoted_string = as<quoted>(lexeme[\'"\' > *(print - \'"\') > \'"\']);\n    const auto bare_string   = as<bare>(lexeme[alpha > *alnum] > \';\');\n    const auto two_ints      = as<two_i>(int_ > int_);\n\n    const auto main          = quoted_string | bare_string | two_ints;\n    using main_type = std::remove_cv_t<decltype(main)>;\n\n    const auto entry_point   = skip(space)[ as<Value>(expect[main] > eoi) ];\n} // namespace square::peg\n\nnamespace boost::spirit::x3 {\n    template <> struct get_info<int_type> {\n        typedef std::string result_type;\n        std::string operator()(int_type const&) const { return "integral number"; }\n    };\n    template <> struct get_info<square::peg::main_type> {\n        typedef std::string result_type;\n        std::string operator()(square::peg::main_type const&) const { return "quoted string, bare string or integer number pair"; }\n    };\n}\n\nint main() {\n    using It = std::string::const_iterator;\n    using D = square::peg::with_error_handling::diags;\n\n    for (std::string const input : { \n            "   -89 0038  ",\n            "   \\"-89 0038\\"  ",\n            "   something123123      ;",\n            // undecidable\n            "",\n            // violate expecations, no successful parse\n            "   -89 oops  ",   // not an integer\n            "   \\"-89 0038  ", // missing "\n            "   bareword ",    // missing ;\n            // trailing debris, successful "main"\n            "   -89 3.14  ",   // followed by .14\n        })\n    {\n        std::cout << "====== " << std::quoted(input) << "\\n";\n\n        It iter = input.begin(), end = input.end();\n        Value v;\n        std::vector<std::string> diags;\n\n        if (parse(iter, end, x3::with<D>(diags) [square::peg::entry_point], v)) {\n            std::cout << "Result value: " << v;\n        } else {\n            std::cout << "Parsing failed";\n        }\n\n        std::cout << " with " << diags.size() << " diagnostics messages: \\n";\n\n        for(auto& msg: diags) {\n            std::cout << " - " << msg << "\\n";\n        }\n    }\n}\n
Run Code Online (Sandbox Code Playgroud)\n\n
\n\n

\xc2\xb9 您可以使用规则及其名称来代替,从而避免这种更复杂的技巧

\n\n

\xc2\xb2 在旧版本的库上,您可能必须努力获取数据的引用语义with<>Live On Coliru

\n