use*_*710 1 c++ unicode utf-8 c++11
假设我有
uint32_t a(3084);
Run Code Online (Sandbox Code Playgroud)
我想创建一个存储unicode字符的字符串,U+3084这意味着我应该取值a并将其用作UTF8表/字符集中正确字符的坐标.
现在,显然std::to_string()对我不起作用,标准中有很多函数在数值和char之间进行转换,我找不到任何给我UTF8支持和输出的东西std::string.
我想问一下我是否必须从头开始创建这个函数,或者C++ 11标准中有一些东西可以帮助我; 请注意我的编译器(gcc/g ++ 4.8.1)不提供完整的支持codecvt.
这里有一些不难转换为C的C++代码.改编自旧的答案.
std::string UnicodeToUTF8(unsigned int codepoint)
{
std::string out;
if (codepoint <= 0x7f)
out.append(1, static_cast<char>(codepoint));
else if (codepoint <= 0x7ff)
{
out.append(1, static_cast<char>(0xc0 | ((codepoint >> 6) & 0x1f)));
out.append(1, static_cast<char>(0x80 | (codepoint & 0x3f)));
}
else if (codepoint <= 0xffff)
{
out.append(1, static_cast<char>(0xe0 | ((codepoint >> 12) & 0x0f)));
out.append(1, static_cast<char>(0x80 | ((codepoint >> 6) & 0x3f)));
out.append(1, static_cast<char>(0x80 | (codepoint & 0x3f)));
}
else
{
out.append(1, static_cast<char>(0xf0 | ((codepoint >> 18) & 0x07)));
out.append(1, static_cast<char>(0x80 | ((codepoint >> 12) & 0x3f)));
out.append(1, static_cast<char>(0x80 | ((codepoint >> 6) & 0x3f)));
out.append(1, static_cast<char>(0x80 | (codepoint & 0x3f)));
}
return out;
}
Run Code Online (Sandbox Code Playgroud)
std :: string_convert :: to_bytes只为您提供单字符重载.
#include <iostream>
#include <string>
#include <locale>
#include <codecvt>
#include <iomanip>
// utility function for output
void hex_print(const std::string& s)
{
std::cout << std::hex << std::setfill('0');
for(unsigned char c : s)
std::cout << std::setw(2) << static_cast<int>(c) << ' ';
std::cout << std::dec << '\n';
}
int main()
{
uint32_t a(3084);
std::wstring_convert<std::codecvt_utf8<char32_t>, char32_t> conv1;
std::string u8str = conv1.to_bytes(a);
std::cout << "UTF-8 conversion produced " << u8str.size() << " bytes:\n";
hex_print(u8str);
}
Run Code Online (Sandbox Code Playgroud)
我得到(用libc ++)
$ ./test
UTF-8 conversion produced 3 bytes:
e0 b0 8c
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
6090 次 |
| 最近记录: |