kam*_*001 70 c# asp.net screen-scraping
如何使用ASP.NET获取网页内容?我需要编写一个程序来获取网页的HTML并将其存储到字符串变量中.
dhi*_*esh 109
您可以使用WebClient
WebClient client = new WebClient();
string downloadString = client.DownloadString("http://www.gooogle.com");
Run Code Online (Sandbox Code Playgroud)
Sco*_*ott 70
我以前遇到过Webclient.Downloadstring的问题.如果你这样做,你可以试试这个:
WebRequest request = WebRequest.Create("http://www.google.com");
WebResponse response = request.GetResponse();
Stream data = response.GetResponseStream();
string html = String.Empty;
using (StreamReader sr = new StreamReader(data))
{
html = sr.ReadToEnd();
}
Run Code Online (Sandbox Code Playgroud)
use*_*674 24
我建议不要使用WebClient.DownloadString.这是因为(至少在.NET 3.5中)DownloadString不够智能,无法使用/删除BOM(如果它存在).这会导致BOM()在返回UTF-8数据时错误地显示为字符串的一部分(至少没有字符集) - ick!
相反,这种轻微的变化将适用于物料清单:
string ReadTextFromUrl(string url) {
// WebClient is still convenient
// Assume UTF8, but detect BOM - could also honor response charset I suppose
using (var client = new WebClient())
using (var stream = client.OpenRead(url))
using (var textReader = new StreamReader(stream, Encoding.UTF8, true)) {
return textReader.ReadToEnd();
}
}
Run Code Online (Sandbox Code Playgroud)
小智 9
Webclient client = new Webclient();
string content = client.DownloadString(url);
Run Code Online (Sandbox Code Playgroud)
传递您想要获取的网页的网址.您可以使用htmlagilitypack解析结果.
我一直在使用 WebClient,但在发表这篇文章时(.NET 6 可用),WebClient 已被弃用。
首选方式是
HttpClient client = new HttpClient();
string content = await client.GetStringAsync(url);
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
159134 次 |
| 最近记录: |