为什么这里的 std::cmatch 比 std::smatch 慢？

Question

为什么这里的 std::cmatch 比 std::smatch 慢？

我首先生成一个长随机字符串：

const int length = 100000;
std::uniform_int_distribution<int> distribution(0, 2);
std::default_random_engine engine{1}; // set 1 as seed

// Just for test usage, not optimal. 
for(int i = 0; i < length; i++) // random abc
    a.push_back('a' + distribution(engine));
std::regex r{ "abc" };

Run Code Online (Sandbox Code Playgroud)

然后我分别使用std::smatch和std::cmatch并对它们进行基准测试：

std::smatch m;
std::string a0 = a;
int result = 0; // to disable optimization.

while (std::regex_search(a0, m, r))
{
    a0 = m.suffix();
    result += static_cast<int>(a0[0]);
}
return result;

Run Code Online (Sandbox Code Playgroud)

std::cmatch m;
const char* currBegin = a.c_str();
int result = 0;

while (std::regex_search(currBegin, m, r))
{
    // For practical use in the future.
    std::string_view v(m[0].first, m[0].second - m[0].first);
    currBegin = m.suffix().first;
    result += static_cast<int>(*currBegin);
}
return result;

Run Code Online (Sandbox Code Playgroud)

一个cmatch比一个慢smatch大约五倍；为什么？

请注意，我使用BENCHMARKCatch2 进行评估，其中 msvc 19.29、发布模式和 C++ 标准为 c++20。

Answer 1

o_o*_*tle 6

哦，我读了的源代码std::regex_search，我发现提供const char*tostd::regex_search会首先引起strlen类似的操作。因此，在更改以下行后，我得到了预期的结果：

while (std::regex_search(currBegin, m, r))

Run Code Online (Sandbox Code Playgroud)

到

while (std::regex_search(currBegin, currEnd, m, r))

Run Code Online (Sandbox Code Playgroud)

在哪里currEnd = currBegin + a.size()。有了的指示符end，strlen类似的操作就被省略了，我在中获得了 50% 的加速std::cmatch。一遍又一遍地计算有效字符会拖累整个过程。

归档时间：	3 年，1 月前
查看次数：	348 次
最近记录：	3 年，1 月前