如何从ASP.NET获取网页的HTML内容

Gop*_*ath 5 html c# asp.net asp.net-mvc httpwebrequest

我想从动态网页中抓取一些内容(似乎它是在MVC中开发的).

数据抓取逻辑是通过HTML敏捷性完成的,但现在问题是,从浏览器请求URL时返回HTML,而来自ASP.NET Web请求的URL的Web响应是不同的.

主要是浏览器响应具有我需要的动态数据(基于查询字符串中传递的值进行渲染),但WebResponse结果不同.

您能否帮助我获取动态网页视图的实际内容WebRequest.

以下是我以前读过的代码:

WebRequest request = WebRequest.Create(sURL);
request.Method = "Get";
//Get the response
WebResponse response = request.GetResponse();
//Read the stream from the response
StreamReader reader = new StreamReader(response.GetResponseStream(), System.Text.Encoding.UTF8);
Run Code Online (Sandbox Code Playgroud)

Ayd*_*din 12

使用HttpWebRequest... 获取任何网页的内容

// We will store the html response of the request here
string siteContent = string.Empty;

// The url you want to grab
string url = "http://google.com";

// Here we're creating our request, we haven't actually sent the request to the site yet...
// we're simply building our HTTP request to shoot off to google...
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
request.AutomaticDecompression = DecompressionMethods.GZip;

// Right now... this is what our HTTP Request has been built in to...
/*
    GET http://google.com/ HTTP/1.1
    Host: google.com
    Accept-Encoding: gzip
    Connection: Keep-Alive
*/


// Wrap everything that can be disposed in using blocks... 
// They dispose of objects and prevent them from lying around in memory...
using(HttpWebResponse response = (HttpWebResponse)request.GetResponse())  // Go query google
using(Stream responseStream = response.GetResponseStream())               // Load the response stream
using(StreamReader streamReader = new StreamReader(responseStream))       // Load the stream reader to read the response
{
    siteContent = streamReader.ReadToEnd(); // Read the entire response and store it in the siteContent variable
}

// magic...
Console.WriteLine (siteContent);
Run Code Online (Sandbox Code Playgroud)

  • 如果你打算投票呢...写一条评论,至少解释为什么我可以改进它,而其他人不会得到这个解决方案不起作用的印象 (3认同)