在Linux环境中,我有一段代码用于读取unicode文件,如下所示。
但是,特殊字符(如丹麦字母æ,ø和å)的处理不正确。对于“abcæøåabc”行,则输出仅为“ abc”。使用调试器,我可以看到的内容wline也只是a\000b\000c\000。
#include <fstream>
#include <string>
std::wifstream wif("myfile.txt");
if (wif.is_open())
{
//set proper position compared to byteorder
wif.seekg(2, std::ios::beg);
std::wstring wline;
while (wif.good())
{
std::getline(wif, wline);
if (!wif.eof())
{
std::wstring convert;
for (auto c : wline)
{
if (c != '\0')
convert += c;
}
}
}
}
wif.close();
Run Code Online (Sandbox Code Playgroud)
谁能告诉我如何阅读整行内容?
谢谢并恭祝安康
您必须使用该imbue()方法来告诉wifstream该文件已编码为UTF-16,并让它为您使用BOM。您不必seekg()手动粘贴BOM。例如:
#include <fstream>
#include <string>
#include <locale>
#include <codecvt>
// open as a byte stream
std::wifstream wif("myfile.txt", std::ios::binary);
if (wif.is_open())
{
// apply BOM-sensitive UTF-16 facet
wif.imbue(std::locale(wif.getloc(), new std::codecvt_utf16<wchar_t, 0x10ffff, std::consume_header>));
std::wstring wline;
while (std::getline(wif, wline))
{
std::wstring convert;
for (auto c : wline)
{
if (c != L'\0')
convert += c;
}
}
wif.close();
}
Run Code Online (Sandbox Code Playgroud)