Rud*_*lis 0 c++ unicode utf-8 visual-studio-2017 c++-experimental
我很高兴看到std::experimental::filesystemVisual Studio 2017中增加了对它的支持,但刚才遇到了Unicode问题。我有点盲目地假设我可以在任何地方使用UTF-8字符串,但是失败了- 将a 构造为std::experimental::filesystem::path从a char*到UTF-8编码的字符串时,不会发生任何转换(即使标头在内部使用_To_wide和_To_byte起作用。我也写了一个简单的测试示例:
#include <string>
#include <experimental\filesystem>
#define WIN32_LEAN_AND_MEAN
#include <Windows.h>
static inline std::string FromUtf16(const wchar_t* pUtf16String)
{
int nUtf16StringLength = static_cast<int>(wcslen(pUtf16String));
int nUtf8StringLength = ::WideCharToMultiByte(CP_UTF8, 0, pUtf16String, nUtf16StringLength, NULL, 0, NULL, NULL);
std::string sUtf8String(nUtf8StringLength, '\0');
nUtf8StringLength = ::WideCharToMultiByte(CP_UTF8, 0, pUtf16String, nUtf16StringLength, const_cast<char *>(sUtf8String.c_str()), nUtf8StringLength, NULL, NULL);
return sUtf8String;
}
static inline std::string FromUtf16(const std::wstring& sUtf16String)
{
return FromUtf16(sUtf16String.c_str());
}
static inline std::wstring ToUtf16(const char* pUtf8String)
{
int nUtf8StringLength = static_cast<int>(strlen(pUtf8String));
int nUtf16StringLength = ::MultiByteToWideChar(CP_UTF8, 0, pUtf8String, nUtf8StringLength, NULL, NULL);
std::wstring sUtf16String(nUtf16StringLength, '\0');
nUtf16StringLength = ::MultiByteToWideChar(CP_UTF8, 0, pUtf8String, nUtf8StringLength, const_cast<wchar_t*>(sUtf16String.c_str()), nUtf16StringLength);
return sUtf16String;
}
static inline std::wstring ToUtf16(const std::string& sUtf8String)
{
return ToUtf16(sUtf8String.c_str());
}
int main(int argc, char** argv)
{
std::string sTest(u8"Ka?is");
std::wstring sWideTest(ToUtf16(sTest));
wchar_t pWideTest[1024] = {};
char pByteTest[1024];
std::experimental::filesystem::path Path1(sTest), Path2(sWideTest);
std::experimental::filesystem::v1::_To_wide(sTest.c_str(), pWideTest);
bool bWideEqual = sWideTest == pWideTest;
std::experimental::filesystem::v1::_To_byte(pWideTest, pByteTest);
bool bUtf8Equal = sTest == pByteTest;
bool bPathsEqual = Path1 == Path2;
printf("wide equal: %d, utf-8 equal: %d, paths equal: %d\n", bWideEqual, bUtf8Equal, bPathsEqual);
}
Run Code Online (Sandbox Code Playgroud)
但是正如我之前所说,我只是盲目地认为UTF-8可以工作。在构造器部分下查看cppreference.com上的std :: experimental :: filesystem :: path,它实际上指出:
- 如果源字符类型为char,则假定源的编码为本机窄编码(因此在POSIX系统上不进行转换)
- 如果源字符类型为char16_t,则使用从UTF-16到本机文件系统编码的转换。
- 如果源字符类型为char32_t,则使用从UTF-32到本机文件系统编码的转换。
- 如果源字符类型为wchar_t,则假定输入为本地宽编码(因此Windows上不会进行任何转换)
我不确定如何解释第一行。首先,它仅说明了与POSIX系统有关的内容(即使我不了解什么是本机窄编码,这是否也意味着UTF-8也不能在POSIX上运行?)。其次,它没有声明有关Windows的任何内容,MSDN也对此保持沉默。那么,如何std::experimental::filesystem::path跨平台安全地处理Unicode字符的属性初始化呢?
的“窄”(8位)编码filesystem::path取决于环境和主机OS。在许多POSIX系统上,它可能是UTF-8,但也可能不是。如果要使用UTF-8,则应通过std::filesystem::path::u8string()和显式使用std::filesystem::u8path()