我正在尝试阅读一个在内容中包含注册商标符号的网页,即®。但是,当我使用quickwatch并在以下示例中查看sb时,看到的是带有问号而不是®的菱形。如果我将sb序列化并通过javascript显示在另一个网页中,则会发生相同的问题。这只是该字符在我的快速监视窗口中出现的方式,还是我不正确地阅读/解码了页面?代码如下:
const int bufSize = 4096;
const int maxBytesToGet = 5000000;
byte[] buf = new byte[bufSize];
StringBuilder sb = new StringBuilder(bufSize);
using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
{
using (Stream responseStream = response.GetResponseStream())
{
while ((bytesToGet = responseStream.Read(buf, 0, buf.Length)) != 0)
{
sb.Append(Encoding.UTF8.GetString(buf, 0, bytesToGet));
if (sb.Length > maxBytesToGet) break;
}
}
}
Run Code Online (Sandbox Code Playgroud)
您假设响应为UTF8。您需要查看响应头,以查看实际的编码是什么。使用a StreamReader代替也会更容易Encoding.GetString。
string responseText;
using (var response = (HttpWebResponse)request.GetResponse())
{
using (Stream responseStream = response.GetResponseStream())
{
var encoding = Encoding.GetEncoding(response.CharacterSet);
using(var reader = new StreamReader(responseStream, encoding))
{
responseText = reader.ReadToEnd();
}
}
}
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
1318 次 |
| 最近记录: |