C#WebClient不返回UTF-8

koi*_*oin 2 c# webclient utf-8

嘿:)我正在努力使WebClient返回UTF-8。但是,当子应该返回像Ä它更多的是E还是让我觉得。

尝试了很多解决方法,但是它不起作用。

private string translate(string input, string languagePair)
{
    string url = String.Format("https://translate.google.com/?hl=en&ie=UTF8&text={0}&langpair={1}", input, languagePair);
    WebClient wc = new WebClient();
    wc.Headers.Add(HttpRequestHeader.AcceptCharset, "UTF-8");
    wc.Encoding = Encoding.UTF8;
    var data = wc.DownloadData(url);
    var result = Encoding.UTF8.GetString(data);
    //string result = wc.DownloadString(url);
    int start = result.IndexOf("result_box");
    string sub = result.Substring(start);
    sub = sub.Substring(0, sub.IndexOf("</span>"));
    start = sub.LastIndexOf(">");
    sub = sub.Substring(start + 1);
    return sub;
}
Run Code Online (Sandbox Code Playgroud)

Ňuf*_*Ňuf 5

从缩短的响应中可以看出,Google只是忽略在AcceptCharset标头中发送的编码,而在中返回ISO-8859-1response:

HTTP/1.1 200 OK
Content-Type: text/html; charset=ISO-8859-1
Content-Language: en
Content-Length: 64202

<!DOCTYPE html><html><head><meta content="text/html; charset=ISO-8859-1" http-equiv="content-type">
Run Code Online (Sandbox Code Playgroud)

因此,当您使用UTF-8编码解码响应时,您将获得无效字符。如果您只是想使其快速工作,我发现将User-Agent标头添加到请求时,Google会以UTF-8返回响应,您可以保留其余代码不变:

private static string translate(string input, string languagePair)
{
    string url = String.Format("https://translate.google.com/?hl=en&ie=UTF8&text={0}&langpair={1}", input, languagePair);
    WebClient wc = new WebClient();
    wc.Headers.Add(HttpRequestHeader.AcceptCharset, "utf-8");
    wc.Headers.Add(HttpRequestHeader.UserAgent, "Mozilla/5.0 (Windows NT 10.0; …) Gecko/20100101 Firefox/55.0");
    wc.Encoding = Encoding.UTF8;
    string result = wc.DownloadString(url);
    int start = result.IndexOf("result_box");
    string sub = result.Substring(start);
    sub = sub.Substring(0, sub.IndexOf("</span>"));
    start = sub.LastIndexOf(">");
    sub = sub.Substring(start + 1);
    return sub;
}
Run Code Online (Sandbox Code Playgroud)

更好的解决方案是检测用于响应的编码并将其用于解码。WebClient没有这个检测内置的,所以你可以描述任何使用的解决方案在这里或使用HttpClient替代,它会自动为您完成此:

private static async Task<string> translate(string input, string languagePair)
{
    string url = String.Format("https://translate.google.com/?hl=en&ie=UTF8&text={0}&langpair={1}", input, languagePair);
    using (var hc = new HttpClient())
    {
        var result = await hc.GetStringAsync(url).ConfigureAwait(false);
        int start = result.IndexOf("result_box");
        string sub = result.Substring(start);
        sub = sub.Substring(0, sub.IndexOf("</span>"));
        start = sub.LastIndexOf(">");
        sub = sub.Substring(start + 1);
        return sub;
    }
}
Run Code Online (Sandbox Code Playgroud)

另外请注意,Google具有Translation API,它可能比从HTML页面解析翻译要好用。