Cli*_* Ok 8 c# asp.net web-crawler
我在C#中了解了为什么Request.Browser.Crawler始终为假(http://www.digcode.com/default.aspx?page=ed51cde3-d979-4daf-afae-fa6192562ea9&article=bc3a7a4f-f53e-4f88-8e9c-c9337f6c05a0) .
有没有人使用某种方法来动态更新Crawler的列表,所以Request.Browser.Crawler会非常有用吗?
你可以检查(正则表达式)Request.UserAgent
.
Peter Bromberg撰写了一篇关于在ASP.NET中编写ASP.NET请求记录器和Crawler Killer的好文章.
这是他在Logger
班上使用的方法:
public static bool IsCrawler(HttpRequest request)
{
// set next line to "bool isCrawler = false; to use this to deny certain bots
bool isCrawler = request.Browser.Crawler;
// Microsoft doesn't properly detect several crawlers
if (!isCrawler)
{
// put any additional known crawlers in the Regex below
// you can also use this list to deny certain bots instead, if desired:
// just set bool isCrawler = false; for first line in method
// and only have the ones you want to deny in the following Regex list
Regex regEx = new Regex("Slurp|slurp|ask|Ask|Teoma|teoma");
isCrawler = regEx.Match(request.UserAgent).Success;
}
return isCrawler;
}
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
8948 次 |
最近记录: |