如何快速解析 Apache 日志文件?

xTw*_*eDx 4 regex parsing swift

假设我有一个日志文件,我已将其拆分为一个字符串数组。例如,我在这里有这些行。

123.4.5.1 - - [03/Sep/2013:18:38:48 -0600] "GET /products/car/ HTTP/1.1" 200 3327 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4) AppleWebKit /537.36 (KHTML, like Gecko) Chrome/29.0.1547.65 Safari/537.36"

123.4.5.6 - - [03/Sep/2013:18:38:58 -0600] "GET /jobs/ HTTP/1.1" 500 821 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:23.0 ) 壁虎/20100101 Firefox/23.0"

我可以用典型的字符串操作来解析这些,但是我认为有一个更好的方法可以用 Regex 来做到这一点。我试图遵循某人在python 中使用的类似模式,但我无法弄清楚。这是我的尝试。

这是模式: ([(\d.)]+) - - [(. ?)] "(. ?)" (\d+) - "(. ?)" "(. ?)" 当我尝试使用它,我没有匹配项。

let lines = contents.split(separator: "\n")
            let pattern = "([(\\d\\.)]+) - - \\[(.*?)\\] \"(.*?)\" (\\d+) - \"(.*?)\" \"(.*?)\""
            let regex = try! NSRegularExpression(pattern: pattern, options: [])
            for line in lines {
                let range = NSRange(location: 0, length: line.utf16.count)
                let parsedData = regex.firstMatch(in: String(line), options: [], range: range)
                print(parsedData)
            }
Run Code Online (Sandbox Code Playgroud)

如果我可以将数据提取到一个模型中,那将是最好的。我需要确保代码是高性能和快速的,因为我应该考虑数千行。

预期结果

let someResult = (String, String, String, String, String, String) or 
let someObject: LogFile = LogFile(String, String, String...)
Run Code Online (Sandbox Code Playgroud)

我会寻找解析的行被分解成它的各个部分。IP, OS, OS Version,Browser Browser Version等等。任何真正的数据解析就足够了。

Rav*_*h13 6

使用您显示的样本,您能否尝试以下操作。

^((?:\d+\.){3}\d+).*?\[([^]]*)\].*?"([^"]*)"\s*(\d+)\s*(\d+)\s*"-"\s*"([^"]*)"$
Run Code Online (Sandbox Code Playgroud)

上述正则表达式的在线演示

说明:为以上添加详细说明。

^(                   ##Starting a capturing group checking from starting of value here.
   (?:\d+\.){3}\d+   ##In a non-capturing group matching 3 digits followed by . with 1 or more digits
)                    ##Closing 1st capturing group here.
.*?\[                ##Matching non greedy till [ here.
([^]]*)              ##Creating 2nd capturing group till ] here.
\].*?"               ##Matching ] and non greedy till " here.
([^"]*)              ##Creating 3rd capturing group which has values till " here.
"\s*                 ##Matching " spaces one or more occurrences here.
(\d+)                ##Creating 4th capturing group here which has all digits here.
\s*                  ##Matching spaces one or more occurrences here.
(\d+)                ##Creating 5th capturing group here which has all digits here.
\s*"-"\s*"           ##Spaces 1 or more occurrences "-" followed by spaces  1 or more occurrences " here.
([^"]*)              ##Creating 6th capturing group till " here.
"$                   ##Matching " at last.
Run Code Online (Sandbox Code Playgroud)

  • 你显然是正则表达式之神。这个解决方案有效。 (2认同)