如何在文件编码未知时使用ReadAllText

Question

如何在文件编码未知时使用ReadAllText

我正在使用ReadAllText读取文件

    String[] values = File.ReadAllText(@"c:\\c\\file.txt").Split(';');

    int i = 0;

    foreach (String s in values)
    {
        System.Console.WriteLine("output: {0} {1} ", i, s);
        i++;
    }

Run Code Online (Sandbox Code Playgroud)

如果我试着读一些文件,我有时会得到错误的字符(对于ÖÜÄ......).输出就像'？',因为编码存在一些问题:

output: 0 TEST
output: 1 A??O?

Run Code Online (Sandbox Code Playgroud)

一种解决方案是在ReadAllText中设置编码,让我们说这样ReadAllText(@"c:\\c\\file.txt", Encoding.UTF8)可以解决问题.但是,如果我仍然会得到'？' 作为输出？如果我不知道文件的编码怎么办？如果每个文件都有不同的编码怎么办？用c#做最好的方法是什么？谢谢

Answer 1

Rom*_*ain 7

可靠地执行此操作的唯一方法是在文本文件的开头查找字节顺序标记.(此blob更一般地表示所使用的字符编码的字节顺序,但也表示编码 - 例如UTF8,UTF16,UTF32).不幸的是,这种方法仅适用于基于Unicode的编码,在此之前没有任何内容(必须使用更不可靠的方法).

该StreamReader类型支持检测这些标记以确定编码 - 您只需将标志传递给参数:

new System.IO.StreamReader("path", true)

Run Code Online (Sandbox Code Playgroud)

然后,您可以检查值stremReader.CurrentEncoding以确定文件使用的编码.但请注意,如果不存在字节编码标记,则CurrentEncoding默认为Encoding.Default.

请参阅codeproject解决方案以检测编码

如果不存在字节编码标记,则CurrentEncoding将使用Encoding.UTF8**而不是**Encoding.Default."detectEncodingFromByteOrderMarks参数通过查看流的前三个字节来检测编码.如果文件以适当的字节顺序标记开头,它会自动识别UTF-8,little-endian Unicode和big-endian Unicode文本.否则,使用UTF8Encoding." [来自文档](http://msdn.microsoft.com/en-us/library/9y86s1a9.aspx) (3认同)

Answer 2

Md *_*ker 6

您必须先检查文件编码.试试这个

System.Text.Encoding enc = null; 
System.IO.FileStream file = new System.IO.FileStream(filePath, 
    FileMode.Open, FileAccess.Read, FileShare.Read); 
if (file.CanSeek) 
{ 
    byte[] bom = new byte[4]; // Get the byte-order mark, if there is one 
    file.Read(bom, 0, 4); 
    if ((bom[0] == 0xef && bom[1] == 0xbb && bom[2] == 0xbf) || // utf-8 
        (bom[0] == 0xff && bom[1] == 0xfe) || // ucs-2le, ucs-4le, and ucs-16le 
        (bom[0] == 0xfe && bom[1] == 0xff) || // utf-16 and ucs-2 
        (bom[0] == 0 && bom[1] == 0 && bom[2] == 0xfe && bom[3] == 0xff)) // ucs-4 
    { 
        enc = System.Text.Encoding.Unicode; 
    } 
    else 
    { 
        enc = System.Text.Encoding.ASCII; 
    } 

    // Now reposition the file cursor back to the start of the file 
    file.Seek(0, System.IO.SeekOrigin.Begin); 
} 
else 
{ 
    // The file cannot be randomly accessed, so you need to decide what to set the default to 
    // based on the data provided. If you're expecting data from a lot of older applications, 
    // default your encoding to Encoding.ASCII. If you're expecting data from a lot of newer 
    // applications, default your encoding to Encoding.Unicode. Also, since binary files are 
    // single byte-based, so you will want to use Encoding.ASCII, even though you'll probably 
    // never need to use the encoding then since the Encoding classes are really meant to get 
    // strings from the byte array that is the file. 

    enc = System.Text.Encoding.ASCII; 
}

Run Code Online (Sandbox Code Playgroud)

归档时间：	13 年，9 月前
查看次数：	12337 次
最近记录：	10 年，1 月前