Unicode可移植性

Question

Unicode可移植性

我目前正在处理一个使用std::string和char进行字符串操作的应用程序- 这在linux上很好,因为Linux与Unicode无关(或者看起来似乎如此;我真的不知道,所以如果我告诉你请纠正我这里的故事).这种当前样式自然会导致这种函数/类声明:

std::string doSomethingFunkyWith(const std::string& thisdata)
{
    /* .... */
}

Run Code Online (Sandbox Code Playgroud)

但是,如果thisdata包含unicode字符,它将在Windows上显示错误,因为std::string无法在Windows上保存unicode字符.

所以我想到了这个概念:

namespace MyApplication {
#ifdef UNICODE
    typedef std::wstring  string_type;
    typedef wchar_t       char_type;
#else
    typedef std::string   string_type;
    typedef char          char_type;
#endif

    /* ... */
    string_type doSomethingFunkyWith(const string_type& thisdata)
    {
        /* ... */
    }
}

Run Code Online (Sandbox Code Playgroud)

这是一个在Windows上支持Unicode的好概念吗？

我目前的工具链包括Linux上的gcc/clang,以及用于Windows支持的wine + MinGW(如果重要的话,也可以通过wine进行交叉测试).

Answer 1

cas*_*nca 5

如何在应用程序中存储字符串完全取决于您 - 毕竟,只要字符串保留在您的应用程序中,就没有人会知道.当您尝试从外部世界(控制台,文件,套接字等)读取或写入字符串时,问题就开始了,这就是操作系统的重要性.

Linux并不完全与Unicode"不可知" - 它确实识别Unicode,但标准库函数采用UTF-8编码,因此Unicode字符串适合标准char数组.另一方面,Windows使用UTF-16编码,因此您需要一个wchar_t数组来表示16位字符.

在typedef你提出S的关系做工精细,但请记住,这本身不会使你的代码的可移植性.例如,如果要以可移植的方式在文件中存储文本,则应选择一种编码并在所有平台上坚持使用 - 这可能需要在某些平台上进行编码之间的转换.

Answer 2

vz0*_*vz0 5

多平台问题源于有许多编码，错误的编码选择会导致编码问题。一旦你解决了这个问题，你应该能够std::wstring在你的所有程序上使用。

通常的工作流程是：

raw_input_data = read_raw_data()
input_encoding = "???" // What is your file or terminal encoding?

unicode_data = convert_to_unicode(raw_input_data, input_encoding)

// Do something with the unicode_data, store in some var, etc.

output_encoding = "???" // Is your terminal output encoding the same as your input?
raw_output_data = convert_from_unicode(unicode_data, output_encoding)

print_raw_data(raw_data)

Run Code Online (Sandbox Code Playgroud)

多数统一问题来源于错误检测的数值input_encoding和output_encoding。在现代 Linux 发行版上，这通常是 UTF-8。在 Windows YMMV 上。

标准 C++ 不知道编码，你应该使用像ICU这样的库来进行转换。

归档时间：	14 年，9 月前
查看次数：	2032 次
最近记录：	13 年，7 月前