无法下载utf-8网页内容

Jus*_*ath 3 c# webclient utf-8

我有一个简单的代码来获取越南网站的响应:http://vnexpress.net,但是有一个小问题.这是第一次,它下载确定,但在此之后,内容包含这样的未知符号:\b\0\0\0\0\0 \0 \a`I %&/ m ....问题是什么?

    string address = "http://vnexpress.net";
    WebClient webClient = new WebClient();
    webClient.Headers.Add("user-agent", "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.97 Safari/537.11 AlexaToolbar/alxg-3.1");
    webClient.Encoding = System.Text.Encoding.UTF8;
    return webClient.DownloadString(address);
Run Code Online (Sandbox Code Playgroud)

Jim*_*hel 9

您会发现响应是GZip.WebClient除非您创建派生类并修改底层HttpWebRequest以允许自动解压缩,否则似乎没有办法下载它.

这是你如何做到这一点:

    public class MyWebClient : WebClient
    {
        protected override WebRequest GetWebRequest(Uri address)
        {
            var req = base.GetWebRequest(address) as HttpWebRequest;
            req.AutomaticDecompression = DecompressionMethods.GZip;
            return req;
        }
    }
Run Code Online (Sandbox Code Playgroud)

并使用它:

string address = "http://vnexpress.net";
MyWebClient webClient = new MyWebClient();
webClient.Headers.Add("user-agent", "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.97 Safari/537.11 AlexaToolbar/alxg-3.1");
webClient.Encoding = System.Text.Encoding.UTF8;
return webClient.DownloadString(address);
Run Code Online (Sandbox Code Playgroud)