Logstash:收到具有不同字符编码的事件

use*_*857 10 logstash

使用logstash时,我看到如下错误:

Received an event that has a different character encoding than you configured. {:text=>"2014-06-22T11:49:57.832631+02:00 10.17.22.37 date=2014-06-22 time=11:49:55 device_id=LM150D9L23000422 log_id=0312318759 type=statistics pri=information session_id=\\\"s617nnE2019973-s617nnE3019973\\\" client_name=\\\"[<IP address>]\\\" dst_ip=\\\"<ip address>\\\" from=\\\"machin@machin.fr\\\" to=\\\"truc@machin.fr\\\" polid=\\\"0:1:1\\\" domain=\\\"machin.fr\\\" subject=\\\"\\xF0\\xCC\\xC1\\xD4\\xC9 \\xD4\\xCF\\xCC\\xD8\\xCB\\xCF \\xDA\\xC1 \\xD0\\xD2\\xCF\\xC4\\xC1\\xD6\\xC9!\\\" mailer=\\\"mta\\\" resolved=\\\"OK\\\" direction=\\\"in\\\" virus=\\\"\\\" disposition=\\\"Quarantine\\\" classifier=\\\"FortiGuard AntiSpam\\\" message_length=\\\"1024\\\"", :expected_charset=>"UTF-8", :level=>:warn}
Run Code Online (Sandbox Code Playgroud)

我的logstash.conf是:

 input {
    file{
            path => "/var/log/fortimail.log"
        }
Run Code Online (Sandbox Code Playgroud)

}

 filter  {
    grok {
                    # grok-parsing for logs
        }
}
 output {
    elasticsearch {
            host => "10.0.10.62"
            embedded => true
            cluster => "Mastertest"
            node_name => "MasterNode"
            protocol => "http"
    }
}
Run Code Online (Sandbox Code Playgroud)

我不知道应该使用什么编解码器来正确格式化事件?他的问题是在主题领域.

Sid*_*ige 6

这是因为默认字符集是UTF-8,并且传入消息包含不在UTF-8集中的字符

要解决这个问题,请使用编解码器和正确的字符集在输入部分设置字符集.例如

file {
            path => "var/log/http/access_log"
            type => apache_access_log
            codec => plain {
                    charset => "ISO-8859-1"
            }
            stat_interval => 60
}
Run Code Online (Sandbox Code Playgroud)

http://logstash.net/docs/1.3.3/codecs/plain

  • 但你怎么能说出什么是正确的charset? (5认同)