我注意到std :: string的length方法返回字节长度,std :: u16string中的相同方法返回2字节序列的数量.
我还注意到,当字符或代码点在BMP之外时,长度返回4而不是2.
此外,Unicode转义序列仅限于\ unnnn,因此转义序列不能插入U + FFFF上方的任何代码点.
换句话说,似乎不支持BMP之外的代理对或代码点.
鉴于此,使用理解UTF-8,UTF-16,代理对等的非标准字符串操作库是否被接受或推荐?
我的编译器是否有错误或我是否错误地使用标准字符串操作方法?
例:
/*
* Example with the Unicode code points U+0041, U+4061, U+10196 and U+10197
*/
#include <iostream>
#include <string>
int main(int argc, char* argv[])
{
std::string example1 = u8"A?";
std::u16string example2 = u"A?";
std::cout << "Escape Example: " << "\u0041\u4061\u10196\u10197" << "\n";
std::cout << "Example: " << example1 << "\n";
std::cout << "std::string Example length: " << example1.length() << "\n";
std::cout << "std::u16string Example length: " << …Run Code Online (Sandbox Code Playgroud) std :: wstring.length()函数的结果是什么,wchar_t(s)的长度或符号的长度是多少?为什么?
TCHAR r2[3];
r2[0] = 0xD834; // D834, DD1E - musical G clef
r2[1] = 0xDD1E; //
r2[2] = 0x0000; // '/0'
std::wstring r = r2;
std::cout << "capacity: " << r.capacity() << std::endl;
std::cout << "length: " << r.length() << std::endl;
std::cout << "size: " << r.size() << std::endl;
std::cout << "max_size: " << r.max_size() << std::endl;
Output>
capacity: 351
length: 2
size: 2
max_size: 2147483646
Run Code Online (Sandbox Code Playgroud) 在下面的程序中,我试图用非ASCII字符来测量字符串的长度.
但是,我不确定为什么size()在使用非ASCII字符时不会打印正确的长度.
#include <iostream>
#include <string>
int main()
{
std::string s1 = "Hello";
std::string s2 = "??????"; // non-ASCII string
std::cout << "Size of " << s1 << " is " << s1.size() << std::endl;
std::cout << "Size of " << s2 << " is " << s2.size() << std::endl;
}
Run Code Online (Sandbox Code Playgroud)
输出:
Size of Hello is 5
Size of ?????? is 18
Run Code Online (Sandbox Code Playgroud)
现场演示Wandbox.