最优化的字符串连接方式

Bhu*_*ant 63 c++ string concatenation

我们每天都遇到很多情况,我们必须在代码中执行繁琐且非常多的字符串操作.我们都知道字符串操作是昂贵的操作.我想知道哪些是最便宜的版本.

最常见的操作是连接(这是我们可以在某种程度上控制的).在C++中连接std :: strings的最佳方法是什么,以及加速连接的各种解决方法?

我的意思是,

std::string l_czTempStr;

1).l_czTempStr = "Test data1" + "Test data2" + "Test data3";

2). l_czTempStr =  "Test data1"; 
    l_czTempStr += "Test data2";
    l_czTempStr += "Test data3";

3). using << operator

4). using append()
Run Code Online (Sandbox Code Playgroud)

另外,我们是否可以使用CString而不是std :: string?

Jes*_*ood 65

这是一个小型测试套件:

#include <iostream>
#include <string>
#include <chrono>
#include <sstream>

int main ()
{
    typedef std::chrono::high_resolution_clock clock;
    typedef std::chrono::duration<float, std::milli> mil;
    std::string l_czTempStr;
    std::string s1="Test data1";
    auto t0 = clock::now();
    #if VER==1
    for (int i = 0; i < 100000; ++i)
    {
        l_czTempStr = s1 + "Test data2" + "Test data3";
    }
    #elif VER==2
    for (int i = 0; i < 100000; ++i)
    {
        l_czTempStr =  "Test data1"; 
        l_czTempStr += "Test data2";
        l_czTempStr += "Test data3";
    }
    #elif VER==3
    for (int i = 0; i < 100000; ++i)
    {
        l_czTempStr =  "Test data1"; 
        l_czTempStr.append("Test data2");
        l_czTempStr.append("Test data3");
    }
    #elif VER==4
    for (int i = 0; i < 100000; ++i)
    {
        std::ostringstream oss;
        oss << "Test data1";
        oss << "Test data2";
        oss << "Test data3";
        l_czTempStr = oss.str();
    }
    #endif
    auto t1 = clock::now();
    std::cout << l_czTempStr << '\n';
    std::cout << mil(t1-t0).count() << "ms\n";
}
Run Code Online (Sandbox Code Playgroud)

coliru:

编译如下:

clang ++ -std = c ++ 11 -O3 -DVER = 1 -Wall -pedantic -pthread main.cpp

21.6463ms

-DVER = 2

6.61773ms

-DVER = 3

6.7855ms

-DVER = 4

102.015ms

看起来2),+=是赢家.

(也有或没有编译-pthread似乎影响时间)

  • 遗憾的是,该测试可能不具代表性.问题在于,通过在循环中不包含`l_czTempStr`声明,版本2和3反复重用相同的缓冲区,而版本1每次都创建一个新的缓冲区`std :: string {""}.您的基准测试表明,重复使用相同的缓冲区而不是分配/解除分配提供了5倍的加速(加速很明显,因素取决于碎片的长度以及如果不保留所有内容,会发生多少次重新分配-面前).不过,我不确定OP是否意味着重新使用相同的缓冲区. (27认同)
  • 你的`stringstream`基准测试包括(令人难以置信的)构建流的慢速时间. (3认同)
  • @MatthieuM.:+1好点.我更新了代码,所以初始字符串是在版本1中预先创建的,但是`operator +`仍然会在内部进行大量的分配/解除分配. (2认同)
  • _"它看起来像2),+ =是胜利者."_我不确定你的结果是否与`.append()`有任何统计差异.而且他们也不应该:其中一个是以另一个来实现的.我不明白为什么他们会有显着的不同. (2认同)

sya*_*yam 32

除了其他答案......

我前段时间对这个问题进行了广泛的基准测试,得出的结论是,所有用例中最有效的解决方案(Linux x86/x64/ARM上的GCC 4.7和4.8)首先reserve()是结果字符串,有足够的空间容纳所有用例连接的字符串,然后只有append()它们(或使用operator +=(),没有区别).

不幸的是,我似乎删除了那个基准,所以你只有我的话(但你可以轻松地调整Mats Petersson的基准来自己验证这个,如果我的话还不够).

简而言之:

const string space = " ";
string result;
result.reserve(5 + space.size() + 5);
result += "hello";
result += space;
result += "world";
Run Code Online (Sandbox Code Playgroud)

根据确切的用例(连接字符串的数量,类型和大小),有时这种方法是最有效的,有时它与其他方法相同,但它永远不会更糟.


问题是,提前计算所需的总大小真是太痛苦了,特别是在混合字符串文字时std::string(我相信上面的例子很清楚).只要修改其中一个文字或添加另一个要连接的字符串,这些代码的可维护性就非常可怕.

一种方法是用来sizeof计算文字的大小,但恕我直言,它创造的混乱比它解决的多,可维护性仍然很糟糕:

#define STR_HELLO "hello"
#define STR_WORLD "world"

const string space = " ";
string result;
result.reserve(sizeof(STR_HELLO)-1 + space.size() + sizeof(STR_WORLD)-1);
result += STR_HELLO;
result += space;
result += STR_WORLD;
Run Code Online (Sandbox Code Playgroud)

一个可用的解决方案(C++ 11,可变参数模板)

我最终选择了一组可变参数模板,它们可以根据需要有效地计算字符串大小(例如,在编译时确定字符串文字的大小),reserve()然后连接所有内容.

在这里,希望这是有用的:

namespace detail {

  template<typename>
  struct string_size_impl;

  template<size_t N>
  struct string_size_impl<const char[N]> {
    static constexpr size_t size(const char (&) [N]) { return N - 1; }
  };

  template<size_t N>
  struct string_size_impl<char[N]> {
    static size_t size(char (&s) [N]) { return N ? strlen(s) : 0; }
  };

  template<>
  struct string_size_impl<const char*> {
    static size_t size(const char* s) { return s ? strlen(s) : 0; }
  };

  template<>
  struct string_size_impl<char*> {
    static size_t size(char* s) { return s ? strlen(s) : 0; }
  };

  template<>
  struct string_size_impl<std::string> {
    static size_t size(const std::string& s) { return s.size(); }
  };

  template<typename String> size_t string_size(String&& s) {
    using noref_t = typename std::remove_reference<String>::type;
    using string_t = typename std::conditional<std::is_array<noref_t>::value,
                                              noref_t,
                                              typename std::remove_cv<noref_t>::type
                                              >::type;
    return string_size_impl<string_t>::size(s);
  }

  template<typename...>
  struct concatenate_impl;

  template<typename String>
  struct concatenate_impl<String> {
    static size_t size(String&& s) { return string_size(s); }
    static void concatenate(std::string& result, String&& s) { result += s; }
  };

  template<typename String, typename... Rest>
  struct concatenate_impl<String, Rest...> {
    static size_t size(String&& s, Rest&&... rest) {
      return string_size(s)
           + concatenate_impl<Rest...>::size(std::forward<Rest>(rest)...);
    }
    static void concatenate(std::string& result, String&& s, Rest&&... rest) {
      result += s;
      concatenate_impl<Rest...>::concatenate(result, std::forward<Rest>(rest)...);
    }
  };

} // namespace detail

template<typename... Strings>
std::string concatenate(Strings&&... strings) {
  std::string result;
  result.reserve(detail::concatenate_impl<Strings...>::size(std::forward<Strings>(strings)...));
  detail::concatenate_impl<Strings...>::concatenate(result, std::forward<Strings>(strings)...);
  return result;
}
Run Code Online (Sandbox Code Playgroud)

就公共接口而言,唯一有趣的部分是最后一个template<typename... Strings> std::string concatenate(Strings&&... strings)模板.用法很简单:

int main() {
  const string space = " ";
  std::string result = concatenate("hello", space, "world");
  std::cout << result << std::endl;
}
Run Code Online (Sandbox Code Playgroud)

启用优化后,任何体面的编译器都应该能够将concatenate调用扩展为与我手动编写所有内容的第一个示例相同的代码.就GCC 4.7和4.8而言,生成的代码与性能完全相同.


Mat*_*son 18

最糟糕的情况是使用普通旧strcat(或sprintf),因为strcat需要一个C字符串,并且必须"计数"以找到结束.对于长串,这是一个真正的表现受害者.C++样式字符串要好得多,性能问题可能与内存分配有关,而不是计算长度.但话又说回来,字符串几何形状增长(每次需要增长时都会增加一倍),所以它并不那么糟糕.

我非常怀疑所有上述方法最终都具有相同或至少非常相似的性能.如果有的话,我希望这stringstream会慢一点,因为支持格式化的开销 - 但我也怀疑它是微不足道的.

由于这种事情是"有趣的",我将回到基准...

编辑:

请注意,这些结果适用于运行x86-64 Linux的MY机器,使用g ++ 4.6.3编译.其他OS,编译器和C++运行时库实现可能会有所不同.如果性能对您的应用程序很重要,那么使用您使用的编译器对您至关重要的系统进行基准测试.

这是我为测试这个而编写的代码.它可能不是真实场景的完美表现,但我认为这是一个代表性场景:

#include <iostream>
#include <iomanip>
#include <string>
#include <sstream>
#include <cstring>

using namespace std;

static __inline__ unsigned long long rdtsc(void)
{
    unsigned hi, lo;
    __asm__ __volatile__ ("rdtsc" : "=a"(lo), "=d"(hi));
    return ( (unsigned long long)lo)|( ((unsigned long long)hi)<<32 );
}

string build_string_1(const string &a, const string &b, const string &c)
{
    string out = a + b + c;
    return out;
}

string build_string_1a(const string &a, const string &b, const string &c)
{
    string out;
    out.resize(a.length()*3);
    out = a + b + c;
    return out;
}

string build_string_2(const string &a, const string &b, const string &c)
{
    string out = a;
    out += b;
    out += c;
    return out;
}

string build_string_3(const string &a, const string &b, const string &c)
{
    string out;
    out = a;
    out.append(b);
    out.append(c);
    return out;
}


string build_string_4(const string &a, const string &b, const string &c)
{
    stringstream ss;

    ss << a << b << c;
    return ss.str();
}


char *build_string_5(const char *a, const char *b, const char *c)
{
    char* out = new char[strlen(a) * 3+1];
    strcpy(out, a);
    strcat(out, b);
    strcat(out, c);
    return out;
}



template<typename T>
size_t len(T s)
{
    return s.length();
}

template<>
size_t len(char *s)
{
    return strlen(s);
}

template<>
size_t len(const char *s)
{
    return strlen(s);
}



void result(const char *name, unsigned long long t, const string& out)
{
    cout << left << setw(22) << name << " time:" << right << setw(10) <<  t;
    cout << "   (per character: " 
         << fixed << right << setw(8) << setprecision(2) << (double)t / len(out) << ")" << endl;
}

template<typename T>
void benchmark(const char name[], T (Func)(const T& a, const T& b, const T& c), const char *strings[])
{
    unsigned long long t;

    const T s1 = strings[0];
    const T s2 = strings[1];
    const T s3 = strings[2];
    t = rdtsc();
    T out = Func(s1, s2, s3);
    t = rdtsc() - t; 

    if (len(out) != len(s1) + len(s2) + len(s3))
    {
        cout << "Error: out is different length from inputs" << endl;
        cout << "Got `" << out << "` from `" << s1 << "` + `" << s2 << "` + `" << s3 << "`";
    }
    result(name, t, out);
}


void benchmark(const char name[], char* (Func)(const char* a, const char* b, const char* c), 
               const char *strings[])
{
    unsigned long long t;

    const char* s1 = strings[0];
    const char* s2 = strings[1];
    const char* s3 = strings[2];
    t = rdtsc();
    char *out = Func(s1, s2, s3);
    t = rdtsc() - t; 

    if (len(out) != len(s1) + len(s2) + len(s3))
    {
        cout << "Error: out is different length from inputs" << endl;
        cout << "Got `" << out << "` from `" << s1 << "` + `" << s2 << "` + `" << s3 << "`";
    }
    result(name, t, out);
    delete [] out;
}


#define BM(func, size) benchmark(#func " " #size, func, strings ## _ ## size)


#define BM_LOT(size) BM(build_string_1, size); \
    BM(build_string_1a, size); \
    BM(build_string_2, size); \
    BM(build_string_3, size); \
    BM(build_string_4, size); \
    BM(build_string_5, size);

int main()
{
    const char *strings_small[]  = { "Abc", "Def", "Ghi" };
    const char *strings_medium[] = { "abcdefghijklmnopqrstuvwxyz", 
                                     "defghijklmnopqrstuvwxyzabc", 
                                     "ghijklmnopqrstuvwxyzabcdef" };
    const char *strings_large[]   = 
        { "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz"
          "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz"
          "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz"
          "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz"
          "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz"
          "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz"
          "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz"
          "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz"
          "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz"
          "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz", 

          "defghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabc" 
          "defghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabc" 
          "defghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabc" 
          "defghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabc" 
          "defghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabc"

          "defghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabc" 
          "defghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabc" 
          "defghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabc" 
          "defghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabc" 
          "defghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabc", 

          "ghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdef"
          "ghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdef"
          "ghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdef"
          "ghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdef"
          "ghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdef"
          "ghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdef"
          "ghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdef"
          "ghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdef"
          "ghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdef"
          "ghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdef"
        };

    for(int i = 0; i < 5; i++)
    {
        BM_LOT(small);
        BM_LOT(medium);
        BM_LOT(large);
        cout << "---------------------------------------------" << endl;
    }
}
Run Code Online (Sandbox Code Playgroud)

以下是一些代表性的结果:

build_string_1 small   time:      4075   (per character:   452.78)
build_string_1a small  time:      5384   (per character:   598.22)
build_string_2 small   time:      2669   (per character:   296.56)
build_string_3 small   time:      2427   (per character:   269.67)
build_string_4 small   time:     19380   (per character:  2153.33)
build_string_5 small   time:      6299   (per character:   699.89)
build_string_1 medium  time:      3983   (per character:    51.06)
build_string_1a medium time:      6970   (per character:    89.36)
build_string_2 medium  time:      4072   (per character:    52.21)
build_string_3 medium  time:      4000   (per character:    51.28)
build_string_4 medium  time:     19614   (per character:   251.46)
build_string_5 medium  time:      6304   (per character:    80.82)
build_string_1 large   time:      8491   (per character:     3.63)
build_string_1a large  time:      9563   (per character:     4.09)
build_string_2 large   time:      6154   (per character:     2.63)
build_string_3 large   time:      5992   (per character:     2.56)
build_string_4 large   time:     32450   (per character:    13.87)
build_string_5 large   time:     15768   (per character:     6.74)
Run Code Online (Sandbox Code Playgroud)

相同的代码,以32位运行:

build_string_1 small   time:      4289   (per character:   476.56)
build_string_1a small  time:      5967   (per character:   663.00)
build_string_2 small   time:      3329   (per character:   369.89)
build_string_3 small   time:      3047   (per character:   338.56)
build_string_4 small   time:     22018   (per character:  2446.44)
build_string_5 small   time:      3026   (per character:   336.22)
build_string_1 medium  time:      4089   (per character:    52.42)
build_string_1a medium time:      8075   (per character:   103.53)
build_string_2 medium  time:      4569   (per character:    58.58)
build_string_3 medium  time:      4326   (per character:    55.46)
build_string_4 medium  time:     22751   (per character:   291.68)
build_string_5 medium  time:      2252   (per character:    28.87)
build_string_1 large   time:      8695   (per character:     3.72)
build_string_1a large  time:     12818   (per character:     5.48)
build_string_2 large   time:      8202   (per character:     3.51)
build_string_3 large   time:      8351   (per character:     3.57)
build_string_4 large   time:     38250   (per character:    16.35)
build_string_5 large   time:      8143   (per character:     3.48)
Run Code Online (Sandbox Code Playgroud)

由此,我们可以得出结论:

  1. 最好的选择是一次添加一点(out.append()out +=),"链式"方法合理地接近.

  2. 预分配字符串没有用.

  3. 使用stringstream是非常糟糕的想法(慢2-4倍).

  4. char *用途new char[].在调用函数中使用局部变量使其最快 - 但稍微不公平地比较它.

  5. 组合短字符串有相当大的开销 - 只需复制数据每个字节最多一个周期[除非数据不适合缓存].

EDIT2

根据评论添加:

string build_string_1b(const string &a, const string &b, const string &c)
{
    return a + b + c;
}
Run Code Online (Sandbox Code Playgroud)

string build_string_2a(const string &a, const string &b, const string &c)
{
    string out;
    out.reserve(a.length() * 3);
    out += a;
    out += b;
    out += c;
    return out;
}
Run Code Online (Sandbox Code Playgroud)

这给出了以下结果:

build_string_1 small   time:      3845   (per character:   427.22)
build_string_1b small  time:      3165   (per character:   351.67)
build_string_2 small   time:      3176   (per character:   352.89)
build_string_2a small  time:      1904   (per character:   211.56)

build_string_1 large   time:      9056   (per character:     3.87)
build_string_1b large  time:      6414   (per character:     2.74)
build_string_2 large   time:      6417   (per character:     2.74)
build_string_2a large  time:      4179   (per character:     1.79)
Run Code Online (Sandbox Code Playgroud)

(32位运行,但64位显示非常相似的结果).


Mik*_*our 8

与大多数微观优化一样,您需要测量每个选项的效果,首先通过测量确定这确实是一个值得优化的瓶颈.没有确定的答案.

append并且+=应该做同样的事情.

+在概念上效率较低,因为你正在创造和摧毁临时工.您的编译器可能会也可能无法将其优化为与追加一样快.

reserve使用总大小调用可能会减少所需的内存分配数量 - 它们可能是最大的瓶颈.

<<(可能是在a stringstream)可能会或可能不会更快; 你需要衡量它.如果你需要格式化非字符串类型,这很有用,但在处理字符串时可能不会特别好或更差.

CString 它的缺点是它不便携,像我这样的Unix黑客无法告诉你它的优点可能是什么,也可能不是.