使用 Logstash 解析 XML 文件

Question

使用 Logstash 解析 XML 文件

KAR*_*HAH 2 elasticsearch logstash logstash-configuration

我正在尝试解析 Logstash 中的 XML 文件。我想使用 XPath 来解析 XML 中的文档。所以当我运行我的配置文件时，数据加载到elasticsearch但它不是我想要加载数据的方式。加载进来的数据elasticsearch是xml文档中的每一行

我的 XML 文件的结构

我想要达到的目标：

在弹性搜索中创建存储以下内容的字段

ID =1
Name = "Finch"

Run Code Online (Sandbox Code Playgroud)

我的配置文件：

input{
    file{
        path => "C:\Users\186181152\Downloads\stations.xml"
        start_position => "beginning"
        sincedb_path => "/dev/null"
        exclude => "*.gz"
        type => "xml"
    }
}
filter{
    xml{
        source => "message"
        store_xml => false
        target => "stations"
        xpath => [
            "/stations/station/id/text()", "station_id",
            "/stations/station/name/text()", "station_name"
        ]
    }
}

output{
    elasticsearch{
        codec => json
        hosts => "localhost"
        index => "xmlns"
    }
    stdout{
        codec => rubydebug
    }
}

Run Code Online (Sandbox Code Playgroud)

Logstash 中的输出：

{
    "station_name" => "%{station_name}",
    "path" => "C:\Users\186181152\Downloads\stations.xml",
    "@timestamp" => 2018-02-09T04:03:12.908Z,
    "station_id" => "%{station_id}",
    "@version" => "1",
    "host" => "BW",
    "message" => "\t\r",
    "type" => "xml"
}

Run Code Online (Sandbox Code Playgroud)

Answer 1

KAR*_*HAH 5

多行过滤器允许将 xml 文件创建为单个事件，我们可以使用 xml-filter 或 xpath 来解析 xml 以在 elasticsearch 中摄取数据。在多行过滤器中，我们提到了一个由 logstash 用来扫描您的 xml 文件的模式（在下面的示例中）。一旦模式匹配之后的所有条目，将被视为单个事件。

以下是我的数据的工作配置文件示例

input {
    file {
        path => "C:\Users\186181152\Downloads\stations3.xml"
        start_position => "beginning"
        sincedb_path => "/dev/null"
        exclude => "*.gz"
        type => "xml"
        codec => multiline {
            pattern => "<stations>" 
            negate => "true"
            what => "previous"
        }
    }
}

filter {
    xml {
        source => "message"
        store_xml => false
        target => "stations"
        xpath => [
            "/stations/station/id/text()", "station_id",
            "/stations/station/name/text()", "station_name"
        ]
    }
}

output {
    elasticsearch {
        codec => json
        hosts => "localhost"
        index => "xmlns24"
    }
    stdout {
        codec => rubydebug
    }
}

Run Code Online (Sandbox Code Playgroud)

归档时间：	7 年，9 月前
查看次数：	11478 次
最近记录：	6 年，2 月前