我有以下格式的日志行,并想要提取字段:
[field1: content1] [field2: content2] [field3: content3] ...
Run Code Online (Sandbox Code Playgroud)
我既不知道字段名称,也不知道字段数量.
我用反向引用和sprintf格式尝试了它,但没有得到任何结果:
match => [ "message", "(?:\[(\w+): %{DATA:\k<-1>}\])+" ] # not working
match => [ "message", "(?:\[%{WORD:fieldname}: %{DATA:%{fieldname}}\])+" ] # not working
Run Code Online (Sandbox Code Playgroud)
这似乎只适用于一个领域,但不是更多:
match => [ "message", "(?:\[%{WORD:field}: %{DATA:content}\] ?)+" ]
add_field => { "%{field}" => "%{content}" }
Run Code Online (Sandbox Code Playgroud)
kv过滤器也不合适,因为字段的内容可能包含空格.
是否有任何插件/策略来解决这个问题?
Logstash Ruby插件可以帮到你.:)
这是配置:
input {
stdin {}
}
filter {
ruby {
code => "
fieldArray = event['message'].split('] [')
for field in fieldArray
field = field.delete '['
field = field.delete ']'
result = field.split(': ')
event[result[0]] = result[1]
end
"
}
}
output {
stdout {
codec => rubydebug
}
}
Run Code Online (Sandbox Code Playgroud)
使用您的日志:
[field1: content1] [field2: content2] [field3: content3]
Run Code Online (Sandbox Code Playgroud)
这是输出:
{
"message" => "[field1: content1] [field2: content2] [field3: content3]",
"@version" => "1",
"@timestamp" => "2014-07-07T08:49:28.543Z",
"host" => "abc",
"field1" => "content1",
"field2" => "content2",
"field3" => "content3"
}
Run Code Online (Sandbox Code Playgroud)
我试过4个字段,它也有效.
请注意,eventruby代码中的logstash事件.您可以使用它来获取所有事件字段,例如message, @timestamp等.
好好享受!!!
我找到了另一种使用正则表达式的方法:
ruby {
code => "
fields = event['message'].scan(/(?<=\[)\w+: .*?(?=\](?: |$))/)
for field in fields
field = field.split(': ')
event[field[0]] = field[1]
end
"
}
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
10309 次 |
| 最近记录: |