_grokparsefailure 使用多个 grok 过滤器在所有已解析日志中标记

Moh*_*min 2 logstash logstash-grok logstash-configuration elastic-stack

我正在尝试使用 Elastic Stack 解析 Minecraft 日志,但遇到了一个非常奇怪的问题(对我来说可能很奇怪!)

我的日志的所有行都被正确解析,但我_grokparsefailure在每个行中都有标签。

我的logstash管道配置是这样的:

input {
  file {
    path => [ "/path/to/my/log" ]
    #start_position => "beginning"
    tags => ["minecraft"]
  }
}

filter {
  if "minecraft" in [tags] {

#    mutate {
#      gsub => [
#        "message", "\\n", ""
#      ]
#    }



    #############################
    #           Num 1           #
    #############################
    grok {
      match => [ "message", "\[%{TIME:timestamp}] \[(?<originator>[^\/]+)?/%{LOGLEVEL:level}]: %{GREEDYDATA:message}" ]
      overwrite => [ "message" ]
      break_on_match => false
    }


    #############################
    #           Num 2           #
    #############################
    grok {
      match => [ "message", "UUID of player %{USERNAME} is %{UUID}" ]
      add_tag => [ "player", "uuid" ]
      break_on_match => true
    }


    #############################
    #           Num 3           #
    #############################
    grok {
      match => [ "message",  "\A(?<player>[a-zA-Z0-9_]+)\[/%{IPV4:ip_address}:%{POSINT}\] logged in with entity id %{POSINT:entity_id} at \(\[(?<world>[a-zA-Z]+)\](?<pos>[^\)]+)\)\Z" ]
      add_tag => [ "player", "join" ]
      break_on_match => true
    }
#
#    grok {
#      match => [ "message",  "^(?<player>[a-zA-Z0-9_]+) has just earned the achievement \[(?<achievement>[^\[]+)\]$" ]
#      add_tag => [ "player", "achievement" ]
#    }
#
#    grok {
#      match => [ "message", "^(?<player>[a-zA-Z0-9_]+) left the game$" ]
#      add_tag => [ "player", "part" ]
#    }
#
#    grok {
#      match => [ "message", "^<(?<player>[a-zA-Z0-9_]+)> .*$" ]
#      add_tag => [ "player", "chat" ]
#    }
  }
}

output {
        elasticsearch {
                hosts => ["elasticsearch:xxxx"]
                user => "xxxx"
                password => "xxxxxx"
        index => "minecraft_s1v15_%{+YYYY.MM.dd}"
        }
}
Run Code Online (Sandbox Code Playgroud)

我的日志样本是:

[11:21:46] [User Authenticator #7/INFO]: UUID of player MyAwsomeUsername is d800b63e-c2d2-3140-83a7-32315d09feca
[11:21:46] [Server thread/INFO]: MyAwsomeUsername joined the game
[11:21:46] [Server thread/INFO]: MyAwsomeUsername[/111.111.111.111:45140] logged in with entity id 6868 at ([world]61.45686149445207, 70.9375, -175.44700729217607)
[11:21:49] [Server thread/INFO]: MyAwsomeUsername issued server command: //efererg
[11:21:52] [Async Chat Thread - #1/INFO]: <MyAwsomeUsername> egerg
[11:21:54] [Async Chat Thread - #1/INFO]: <MyAwsomeUsername> ef
[12:00:19] [Server thread/INFO]: MyAwsomeUsername lost connection: Disconnected
[12:00:19] [Server thread/INFO]: MyAwsomeUsername left the game
[12:00:21] [User Authenticator #8/INFO]: UUID of player MyAwsomeUsername is d800b63e-c2d2-3140-83a7-32315d09feca
[12:00:21] [Server thread/INFO]: MyAwsomeUsername joined the game
[12:00:21] [Server thread/INFO]: MyAwsomeUsername[/111.111.111.111:45470] logged in with entity id 11767 at ([world]61.45686149445207, 70.9375, -175.44700729217607)
[12:00:27] [Server thread/INFO]: MyAwsomeUsername issued server command: /wgergerger
[12:00:29] [Async Chat Thread - #2/INFO]: <MyAwsomeUsername> gerg
[12:00:33] [Async Chat Thread - #2/INFO]: <MyAwsomeUsername> gerger
[12:00:35] [Async Chat Thread - #2/INFO]: <MyAwsomeUsername> rerg
[12:00:37] [Server thread/INFO]: MyAwsomeUsername lost connection: Disconnected
[12:00:37] [Server thread/INFO]: MyAwsomeUsername left the game
[12:00:38] [User Authenticator #8/INFO]: UUID of player MyAwsomeUsername is d800b63e-c2d2-3140-83a7-32315d09feca
[12:00:38] [Server thread/INFO]: MyAwsomeUsername joined the game
[12:00:38] [Server thread/INFO]: MyAwsomeUsername[/111.111.111.111:45476] logged in with entity id 11793 at ([world]62.97573252632079, 71.0, -179.01739415148737)
[12:00:40] [Server thread/INFO]: MyAwsomeUsername lost connection: Disconnected
[12:00:40] [Server thread/INFO]: MyAwsomeUsername left the game
[12:00:51] [User Authenticator #8/INFO]: UUID of player MyAwsomeUsername is d800b63e-c2d2-3140-83a7-32315d09feca
[12:00:51] [Server thread/INFO]: MyAwsomeUsername joined the game
[12:00:51] [Server thread/INFO]: MyAwsomeUsername[/111.111.111.111:45486] logged in with entity id 11805 at ([world]62.97573252632079, 71.0, -179.01739415148737)
[12:00:55] [Server thread/INFO]: MyAwsomeUsername lost connection: Disconnected
[12:00:55] [Server thread/INFO]: MyAwsomeUsername left the game


Run Code Online (Sandbox Code Playgroud)

解释:

我评论了其他 grok 以更简单地解释问题(当取消它们时,问题完全相同)

我测试了3种情况:

  1. 注释 2 和 3 以及其他注释,只有 1 条处于活动状态,在这种情况下,每一行日志都被解析,但_grokparsefailure记录中没有任何内容。
  2. 只有和其他人一样被评论,并且 1 和 2 是活跃的。在这种情况下,与 grok 编号 2 匹配的日志行被解析为 no _grokparsefailure,其他的则被解析_grokparsefailure。这还是有道理的!
  3. 在最后一种情况下,我取消了所有 3 个 grok 的注释(1、2、3 处于活动状态),并且每一行日志都被解析,但是其中_grokparsefailure!即使默认情况下break_on_match是这样true,并且当它与 grok 2 匹配时,也不应该使用 grok 3 进行测试。

我在 stackoverflow 中读到了一些与我类似的其他问题:类似问题 1,我mutate在 grok 过滤器之前添加了块(导致日志的每一行都以 \n 结尾),但没有任何改变,问题仍然存在!

我想我需要提到的另一件事是,我知道在 grok 2 (3 和其他)旁边添加更多 grok 会导致此标记导致某些日志根本不匹配 grok 2 并且必须用正则表达式将它们包装起来。但现在至少匹配 grok 2 的日志应该是好的(否_grokparsefailure),但事实并非如此!(在 stackoverflow 问题中阅读:类似问题 2

lea*_*jmp 6

事实上,这是预期的行为,您有点混淆了 Logstash 和 Grok 的工作方式。

首先,所有过滤器都是相互独立的,break_on_match在 a 中使用grok只会影响 that ,对于管道中之后出现的grok其他过滤器没有影响。仅当您在同一个模式中有多个模式时,grok这才有意义,但这不是您的情况。break_on_matchgrok

其次,由于 Logstash 是串行的并且您没有使用任何条件,因此您的grok过滤器将应用于管道中的每条消息,如果它已经被解析并不重要,这就是使您的线路获得_grokparsefailure

要解决这个问题,您需要使用条件。

您在第一个过滤器中不需要条件grok,第一个过滤器只是获取日志行的不同部分并覆盖到字段中message,第二个过滤器只是您的第一个测试,对于grok第二个过滤器之后的每个测试,您将需要以下配置。

if "_grokparsefailure" in [tags] {
  grok {
    match => "your pattern"
    add_tag => "your tags"
    remove_tag => ["_grokparsefailure"]
  }
}
Run Code Online (Sandbox Code Playgroud)

grok当消息_grokparsefailure在字段中时才会应用tags,如果消息与您的模式匹配,则此标签将被删除,如果不匹配,则标签保留,并且消息可以通过以下 grok 进行测试。

最后你的grok配置应该看起来像这样。

grok {
  "your first grok"
}

grok {
  "your second grok, can be any of the others"
}

if "_grokparsefailure" in [tags] {
  grok {
    "your grok N"
    remove_tag => ["_grokparsefailure"]
  }
}
Run Code Online (Sandbox Code Playgroud)

这只是需要的,因为您要为每条消息添加不同的标签,例如,如果将此逻辑移动到过滤mutate器,则只能使用两个grok过滤器,第二个过滤器将是 multi-pattern grokbreak_on_match设置为true

grok {
  match => { 
    "message" => [ 
      "pattern from grok 2",
      "pattern from grok 3",
      "pattern from grok N"
    ]
  }
  break_on_match => true
}
Run Code Online (Sandbox Code Playgroud)