bal*_*569 7 c# html-agility-pack
在这里,我试图阅读网址并在页面中获取图像.我需要排除页面,如果它是404并停止从404错误页面获取图像.如何使用HtmlAgilityPack?这是我的代码
var document = new HtmlWeb().Load(completeurl);
var urls = document.DocumentNode.Descendants("img")
.Select(e => e.GetAttributeValue("src", null))
.Where(s => !String.IsNullOrEmpty(s)).ToList();
Run Code Online (Sandbox Code Playgroud)
您需要PostRequestHandler在HtmlWeb实例上注册一个事件,它将在每个下载的文档之后引发,您将可以访问该HttpWebResponse对象.它有一个属性StatusCode.
HtmlWeb web = new HtmlWeb();
HttpStatusCode statusCode = HttpStatusCode.OK;
web.PostRequestHandler += (request, response) =>
{
if (response != null)
{
statusCode = response.StatusCode;
}
}
var doc = web.Load(completeUrl)
if (statusCode == HttpStatusCode.OK)
{
// received a read document
}
Run Code Online (Sandbox Code Playgroud)
看一下GutHub上HtmlAgilityPack的代码,它更简单,HtmlWeb有一个属性StatusCode设置值:
var web = new HtmlWeb();
var document = web.Load(completeurl);
if (web.StatusCode == HttpStatusCode.OK)
{
var urls = document.DocumentNode.Descendants("img")
.Select(e => e.GetAttributeValue("src", null))
.Where(s => !String.IsNullOrEmpty(s)).ToList();
}
Run Code Online (Sandbox Code Playgroud)
AgilityPack API已有更新.诀窍仍然是一样的:
var htmlWeb = new HtmlWeb();
var lastStatusCode = HttpStatusCode.OK;
htmlWeb.PostResponse = (request, response) =>
{
if (response != null)
{
lastStatusCode = response.StatusCode;
}
};
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
1922 次 |
| 最近记录: |