如何使用Nagios监视日志文件

Ken*_*130 23 nagios logfiles

我们正在使用Nagios来监控我们的网络并取得巨大成功.但是,我们有一个关键应用程序错误的系统日志,当我设置check_log时,它看起来不像监视设备那样好.

问题是:

  • 它只显示最后一个条目
  • 似乎没有办法确认严重错误并使监视器恢复到良好状态

nagios是错误的工具,还是我们只是没有设置服务监控权?

这是我的参赛作品

# log file
define command{
        command_name    check_log
        command_line    $USER1$/check_log -F /var/log/applications/appcrit.log -O /tmp/appcrit.log -q ?
}


# Define the log monitering service
define service{
        name                            logfile-check           ;
        use                             generic-service         ;
        check_period                    24x7                    ;
        max_check_attempts              1                       ;
        normal_check_interval           5                       ;
        retry_check_interval            1                       ;
        contact_groups                  admins                  ;
        notification_options            w,u,c,r                 ;
        notification_period             24x7                    ;
        register                        0                       ;
        }

define service{
        use                             logfile-check
        host_name                       localhost
        service_description             CritLogFile
        check_command                   check_log
}
Run Code Online (Sandbox Code Playgroud)

Arc*_*hie 28

对于使用Nagios监视日志,通常日志检查器将仅在每次调用时为新发现的错误消息返回警告(因此它必须保留某些状态以便在后续运行时忽略它们).因此我通常设置:

max_check_attempts              1
is_volatile                     1
Run Code Online (Sandbox Code Playgroud)

这导致Nagios发出警报,但只发出一次,然后恢复正常.

我最喜欢的日志检查器是logwarn,但我有偏见,因为我没有找到任何我喜欢的现有的自己写的.logwarn包中包含Nagios插件.

  • 圣洁晚了比从不回应! (3认同)

小智 3

由于实现目标的方法有很多,Consol 还提供了一个不错的插件: https ://labs.consol.de/lang/en/nagios/check_logfiles/

  • 支持正则表达式
  • 支持日志轮转

要使用它,您需要一个cfg文件,这是oracle数据库的示例

@searches = ({
  tag => 'oraalerts',
options => 'sticky=28800',
  logfile => '/u01/app/oracle/diag/rdbms/davmdkp/DAVMDKP1/trace/alert_DAVMDKP1.log',
  criticalpatterns => [
      'ORA\-0*204[^\d]',        # error in reading control file
      'ORA\-0*206[^\d]',        # error in writing control file
      'ORA\-0*210[^\d]',        # cannot open control file
      'ORA\-0*257[^\d]',        # archiver is stuck
      'ORA\-0*333[^\d]',        # redo log read error
      'ORA\-0*345[^\d]',        # redo log write error
      'ORA\-0*4[4-7][0-9][^\d]',# ORA-0440 - ORA-0485 background process failure
      'ORA\-0*48[0-5][^\d]',
      'ORA\-0*6[0-3][0-9][^\d]',# ORA-6000 - ORA-0639 internal errors
      'ORA\-0*1114[^\d]',        # datafile I/O write error
      'ORA\-0*1115[^\d]',        # datafile I/O read error
      'ORA\-0*1116[^\d]',        # cannot open datafile
      'ORA\-0*1118[^\d]',        # cannot add a data file
      'ORA\-0*1122[^\d]',       # database file 16 failed verification check
      'ORA\-0*1171[^\d]',       # datafile 16 going offline due to error advancing checkpoint
      'ORA\-0*1201[^\d]',       # file 16 header failed to write correctly
      'ORA\-0*1208[^\d]',       # data file is an old version - not accessing current version
      'ORA\-0*1578[^\d]',        # data block corruption
      'ORA\-0*1135[^\d]',        # file accessed for query is offline
      'ORA\-0*1547[^\d]',        # tablespace is full
      'ORA\-0*1555[^\d]',        # snapshot too old
      'ORA\-0*1562[^\d]',        # failed to extend rollback segment
      'ORA\-0*162[89][^\d]',     # ORA-1628 - ORA-1632 maximum extents exceeded
      'ORA\-0*163[0-2][^\d]',
      'ORA\-0*165[0-6][^\d]',    # ORA-1650 - ORA-1656 tablespace is full
      'ORA\-16014[^\d]',      # log cannot be archived, no available destinations
      'ORA\-16038[^\d]',      # log cannot be archived
      'ORA\-19502[^\d]',      # write error on datafile
      'ORA\-27063[^\d]',         # number of bytes read/written is incorrect
      'ORA\-0*4031[^\d]',        # out of shared memory.
      'No space left on device',
      'Archival Error',
  ],
  warningpatterns => [
      'ORA\-0*3113[^\d]',        # end of file on communication channel
      'ORA\-0*6501[^\d]',         # PL/SQL internal error
      'ORA\-0*1140[^\d]',         # follows WARNING: datafile #20 was not in online backup mode
      'Archival stopped, error occurred. Will continue retrying',
  ]
});
Run Code Online (Sandbox Code Playgroud)