将字符串的字符编码从windows-1252转换为utf-8

Var*_*554 14 c# asp.net

我已经将Word文档(docx)转换为html,转换后的html将windows-1252作为其字符编码.在.Net中,对于这个1252字符编码,所有特殊字符都显示为" ".这个html正在Rad编辑器中显示,如果html是Utf-8格式,它将正确显示.

我曾尝试过以下代码但没有静脉

Encoding wind1252 = Encoding.GetEncoding(1252);  
Encoding utf8 = Encoding.UTF8;  
byte[] wind1252Bytes = wind1252.GetBytes(strHtml);  
byte[] utf8Bytes = Encoding.Convert(wind1252, utf8, wind1252Bytes);  
char[] utf8Chars = new char[utf8.GetCharCount(utf8Bytes, 0, utf8Bytes.Length)];   
utf8.GetChars(utf8Bytes, 0, utf8Bytes.Length, utf8Chars, 0);  
string utf8String = new string(utf8Chars);
Run Code Online (Sandbox Code Playgroud)

有关如何将html转换为UTF-8的任何建议?

sco*_*udy 14

这应该这样做:

Encoding wind1252 = Encoding.GetEncoding(1252);
Encoding utf8 = Encoding.UTF8;  
byte[] wind1252Bytes = wind1252.GetBytes(strHtml);
byte[] utf8Bytes = Encoding.Convert(wind1252, utf8, wind1252Bytes);
string utf8String = Encoding.UTF8.GetString(utf8Bytes);
Run Code Online (Sandbox Code Playgroud)


Var*_*554 7

实际上问题出在这里

byte[] wind1252Bytes = wind1252.GetBytes(strHtml); 
Run Code Online (Sandbox Code Playgroud)

我们不应该从html字符串中获取字节.我尝试了下面的代码,它工作.

Encoding wind1252 = Encoding.GetEncoding(1252);
Encoding utf8 = Encoding.UTF8;
byte[] wind1252Bytes = ReadFile(Server.MapPath(HtmlFile));
byte[] utf8Bytes = Encoding.Convert(wind1252, utf8, wind1252Bytes);
string utf8String = Encoding.UTF8.GetString(utf8Bytes);


public static byte[] ReadFile(string filePath)      
    {      
        byte[] buffer;   
        FileStream fileStream = new FileStream(filePath, FileMode.Open, FileAccess.Read);  
        try
        {
            int length = (int)fileStream.Length;  // get file length    
            buffer = new byte[length];            // create buffer     
            int count;                            // actual number of bytes read     
            int sum = 0;                          // total number of bytes read    

            // read until Read method returns 0 (end of the stream has been reached)    
            while ((count = fileStream.Read(buffer, sum, length - sum)) > 0)
                sum += count;  // sum is a buffer offset for next reading
        }
        finally
        {
            fileStream.Close();
        }
        return buffer;
    }
Run Code Online (Sandbox Code Playgroud)

  • 我认为他想说系统区域设置会影响字节,因此它永远不会很好地编码,因此需要读取真实源以获取真实字节,然后进行转换。 (2认同)