从相同的硬编码字符串文字初始化 std::string 和 std::wstring

Question

从相同的硬编码字符串文字初始化 std::string 和 std::wstring

当我偶然发现一个场景时，我正在编写一些单元测试，这个场景已经困扰了我几次。

我需要生成一些字符串来测试 JSON writer 对象。由于作者支持 UTF16 和 UTF8 输入，我想用这两种输入来测试它。

考虑以下测试：

class UTF8;
class UTF16;

template < typename String, typename SourceEncoding >
void writeJson(std::map<String, String> & data)
{
    // Write to file
}

void generateStringData(std::map<std::string, std::string> & data)
{
    data.emplace("Lorem", "Lorem Ipsum is simply dummy text of the printing and typesetting industry.");
    data.emplace("Ipsum", "Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book");
    data.emplace("Contrary", "Contrary to popular belief, Lorem Ipsum is not simply random text. It has roots in a piece of classical Latin literature from 45 BC, making it over 2000 years old");
}

void generateStringData(std::map<std::wstring, std::wstring> & data)
{
    data.emplace(L"Lorem", L"Lorem Ipsum is simply dummy text of the printing and typesetting industry.");
    data.emplace(L"Ipsum", L"Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book");
    data.emplace(L"Contrary", L"Contrary to popular belief, Lorem Ipsum is not simply random text. It has roots in a piece of classical Latin literature from 45 BC, making it over 2000 years old");
}

template < typename String, typename SourceEncoding >
void testWriter() {
    std::map<String, String> data;
    generateStringData(data);
    writeJson<String, SourceEncoding>(data);
}

int main() {
    testWriter<std::string, UTF8>();
    testWriter<std::wstring, UTF16>();
}

Run Code Online (Sandbox Code Playgroud)

除了重复方法之外，我设法很好地包装了所有内容generateStringData()。我想知道是否有可能将这两种generateStringData()方法合并为一种方法？

我知道我可以使用单一方法生成 UTF8 格式的字符串，然后使用其他方法将字符串转换为 UTF16，但我试图找出是否还有其他方法。

我考虑过/尝试过什么？

使用_T()orTCHAR或#ifdef UNICODE不会有帮助，因为我需要在支持 Unicode 的同一平台上使用两种风格（例如 Win >= 7）
std::wstring从不是的东西初始化将L""不起作用，因为它需要 wchar_t
逐个字符初始化是行不通的，因为它还需要L''
使用""s不起作用，因为返回类型取决于类型charT

Answer 1

Adr*_*thy 3

如果您只需要简单的 ASCII 编码为chars 和wchar_ts，那么您可以使用函数模板来完成（无需专门化）：

#include <iostream>
#include <map>
#include <string>
#include <utility>

template <typename StringType>
void generateStringData(std::map<StringType, StringType> &data) {
  static const std::pair<const char *, const char *> entries[] = {
    { "Lorem", "Lorem Ipsum is simply dummy text ..."},
    { "Ipsum", "Ipsum has been the industry's standard ..."}
  };
  for (const auto &entry : entries) {
    data.emplace(StringType(entry.first, entry.first + std::strlen(entry.first)),
                 StringType(entry.second, entry.second + std::strlen(entry.second)));
  }
}

int main() {
  std::map<std::string, std::string> ansi;
  generateStringData(ansi);
  std::map<std::wstring, std::wstring> wide;
  generateStringData(wide);

  std::cout << ansi["Lorem"] << std::endl;
  std::wcout << wide[L"Lorem"] << std::endl;
  return 0;
}

Run Code Online (Sandbox Code Playgroud)

之所以有效，是因为wchar_t任何 ASCII 字符的版本都是扩展为 16 位的 ASCII 值。如果源字符串中有“有趣”的字符，这实际上不会将它们转换为正确的 UTF-16。

另请注意，您几乎肯定会在内存中得到四个字符串副本：可执行文件中的 ASCII 源字符串的两个副本（来自函数模板的两个实例），以及堆中的和副本char。wchar_t

但这可能并不比预处理器版本差。使用预处理器，您最终可能会在可执行文件中得到char和wchar_t版本，以及堆中的char和副本。wchar_t

预处理器方法可以做的是帮助您绕过这个答案顶部的大if ；通过预处理器，您可以使用非 ASCII 字符。

[实现说明：最初这些赋值使用了std::begin(entry.first)and std::end(entry.first)，但其中包括字符串终止符作为字符串本身的一部分。]

归档时间：	8 年，8 月前
查看次数：	2016 次
最近记录：	8 年，8 月前