nginx,分离机器人访问日志和人工访问日志

Met*_*eor 7 nginx log-files

我试图将机器人访问日志和人工访问日志分开,所以我使用以下配置:

    http {
....
    map $http_user_agent $ifbot {
        default 0;
        "~*rogerbot"        3;
        "~*ChinasoSpider"       3;
        "~*Yahoo"           1;
        "~*Bot"         1;
        "~*Spider"          1;
        "~*archive"         1;
        "~*search"          1;
        "~*Yahoo"           1;
        "~Mediapartners-Google" 1;
        "~*bingbot"         1;
        "~*YandexBot"           1;
        "~*Feedly"  2;
        "~*Superfeedr"  2;
        "~*QuiteRSS"    2;
        "~*g2reader"    2;
        "~*Digg"    2;
        "~*trendiction"     3;
        "~*AhrefsBot"           3;
        "~*curl"            3;
        "~*Ruby"            3;
        "~*Player"          3;
        "~*Go\ http\ package"   3;
        "~*Lynx"            3;
        "~*Sleuth"          3;
        "~*Python"          3;
        "~*Wget"            3;
        "~*perl"            3;
        "~*httrack"         3;
        "~*JikeSpider"          3;
        "~*PHP"         3;
        "~*WebIndex"            3;
        "~*magpie-crawler"      3;
        "~*JUC"         3;
        "~*Scrapy"          3;
        "~*libfetch"            3;
        "~*WinHTTrack"      3;
        "~*htmlparser"      3;
        "~*urllib"          3;
        "~*Zeus"            3;
        "~*scan"            3;
        "~*Indy\ Library"       3;
        "~*libwww-perl"     3;
        "~*GetRight"            3;
        "~*GetWeb!"         3;
        "~*Go!Zilla"            3;
        "~*Go-Ahead-Got-It"     3;
        "~*Download\ Demon" 3;
        "~*TurnitinBot"     3;
        "~*WebscanSpider"       3;
        "~*WebBench"        3;
        "~*YisouSpider"     3;
        "~*check_http"      3;
        "~*webmeup-crawler"     3;
        "~*omgili"      3;
        "~*blah"        3;
        "~*fountainfo"      3;
        "~*MicroMessenger"      3;
        "~*QQDownload"      3;
        "~*shoulu.jike.com"     3;
        "~*omgilibot"       3;
        "~*pyspider"        3;
    }
....
}
Run Code Online (Sandbox Code Playgroud)

在服务器部分,我正在使用:

    if ($ifbot = "1") {
    set $spiderbot 1;
}
if ($ifbot = "2") {
    set $rssbot 1;
}
if ($ifbot = "3") {
    return 403;
    access_log /web/log/badbot.log  main;
}

access_log /web/log/location_access.log  main;
    access_log /web/log/spider_access.log main if=$spiderbot;
    access_log /web/log/rssbot_access.log main if=$rssbot;
Run Code Online (Sandbox Code Playgroud)

不过貌似nginx会在location_access.log和spider_access.log中写入一些robot日志。

如何分离机器人的日志?

还有一个问题是有些机器人日志没有写入spider_access.log,而是存在于location_access.log中。我的地图似乎不起作用。我定义“地图”时有什么问题吗?

Mar*_*erg 0

您正在突破条件 Nginx 的限制if,它的目的是最小化使用

考虑使用 Rsyslog 来跟踪您的 Nginx 访问日志。Rsyslog 具有强大的选项,用于匹配日志字符串的内容并将其结果发送到不同的日志。然后您就可以获得您正在寻找的三个单独的日志。