我需要将正则表达式与 golang 集成的帮助。我想解析日志文件并创建一个在https://regex101.com/r/p4mbiS/1/上看起来相当不错的正则表达式
日志行看起来像这样:
57.157.87.86 - - [06/Feb/2020:00:11:04 +0100] "GET /?parammore=1&customer_id=1&version=1.56¶m=meaningful&customer_name=somewebsite.de&some_id=4&cachebuster=1580944263903 HTTP/1.1" 204 0 "https://www.somewebsite.com/more/andheresomemore/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:72.0) Gecko/20100101 Firefox/72.0"
Run Code Online (Sandbox Code Playgroud)
正则表达式是这样的:
(?P<ip>([^\s]+)).+?\[(?P<localtime>(.*?))\].+?GET\s\/\?(?P<request>.+?)\".+?\"(?P<ref>.+?)\".\"(?P<agent>.+?)\"
Run Code Online (Sandbox Code Playgroud)
命名组的结果应如下所示:
ip: 57.157.87.86
当地时间:2020年2月6日:00:11:04 +0100
请求:parammore=1&customer_id=1&...HTTP/1.1
参考: https: //www.somewebsite.com/more/andheresomemore/
代理:Mozilla/5.0(Windows NT 10.0;Win64;x64;rv:72.0)...
regex101.com 生成的 golang 代码对我不起作用。我尝试改进它但没有成功。
golang 代码只返回整个字符串而不是组。
package main
import (
"regexp"
"fmt"
)
func main() {
var re = regexp.MustCompile(`(?P<ip>([^\s]+)).+?\[(?P<localtime>(.*?))\].+?GET\s\/\?(?P<request>.+?)\".+?\"(?P<ref>.+?)\".\"(?P<agent>.+?)\"`)
var str = `57.157.87.86 - - [06/Feb/2020:00:11:04 +0100] "GET /?parammore=1&customer_id=1&version=1.56¶m=meaningful&customer_name=somewebsite.de&some_id=4&cachebuster=1580944263903 HTTP/1.1" 204 0 "https://www.somewebsite.com/more/andheresomemore/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:72.0) Gecko/20100101 Firefox/72.0"`
if len(re.FindStringIndex(str)) > 0 {
fmt.Println(re.FindString(str),"found at index",re.FindStringIndex(str)[0])
}
}
Run Code Online (Sandbox Code Playgroud)
单场比赛解决方案
由于您定义了捕获组并需要提取它们的值,因此您需要使用.FindStringSubmatch,请参阅此 Go lang 演示:
package main
import (
"fmt"
"regexp"
)
func main() {
var re = regexp.MustCompile(`(?P<ip>\S+).+?\[(?P<localtime>.*?)\].+?GET\s/\?(?P<request>.+?)".+?"(?P<ref>.+?)"\s*"(?P<agent>.+?)"`)
var str = `57.157.87.86 - - [06/Feb/2020:00:11:04 +0100] "GET /?parammore=1&customer_id=1&version=1.56¶m=meaningful&customer_name=somewebsite.de&some_id=4&cachebuster=1580944263903 HTTP/1.1" 204 0 "https://www.somewebsite.com/more/andheresomemore/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:72.0) Gecko/20100101 Firefox/72.0"`
result := make(map[string]string)
match := re.FindStringSubmatch(str)
for i, name := range re.SubexpNames() {
if i != 0 && name != "" {
result[name] = match[i]
}
}
fmt.Printf("IP: %s\nLocal Time: %s\nRequest: %s\nRef: %s\nAgent: %s\n",result["ip"], result["localtime"], result["request"], result["ref"], result["agent"])
}
Run Code Online (Sandbox Code Playgroud)
输出:
IP: 57.157.87.86
Local Time: 06/Feb/2020:00:11:04 +0100
Request: parammore=1&customer_id=1&version=1.56¶m=meaningful&customer_name=somewebsite.de&some_id=4&cachebuster=1580944263903 HTTP/1.1
Ref: https://www.somewebsite.com/more/andheresomemore/
Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:72.0) Gecko/20100101 Firefox/72.0
Run Code Online (Sandbox Code Playgroud)
在模式中如此频繁地使用并不是一个好主意,.+?因为它会降低性能,因此我用否定字符类替换了这些点模式,并尝试使模式更详细一些。
多匹配解决方案
在这里,您需要使用regexp.FindAllStringSubmatch:
请参阅此 Go 演示:
package main
import (
"fmt"
"regexp"
)
func main() {
var re = regexp.MustCompile(`(?P<ip>\S+).+?\[(?P<localtime>.*?)\].+?GET\s/\?(?P<request>.+?)".+?"(?P<ref>.+?)"\s*"(?P<agent>.+?)"`)
var str = `57.157.87.86 - - [06/Feb/2020:00:11:04 +0100] "GET /?parammore=1&customer_id=1&version=1.56¶m=meaningful&customer_name=somewebsite.de&some_id=4&cachebuster=1580944263903 HTTP/1.1" 204 0 "https://www.somewebsite.com/more/andheresomemore/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:72.0) Gecko/20100101 Firefox/72.0"`
result := make([]map[string]string,0)
for _, match := range re.FindAllStringSubmatch(str, -1) {
res := make(map[string]string)
for i, name := range re.SubexpNames() {
if i != 0 && name != "" {
res[name] = match[i]
}
}
result = append(result, res)
}
// Displaying the matches
for i, match := range(result) {
fmt.Printf("--------------\nMatch %d:\n", i+1)
for i, name := range re.SubexpNames() {
if i != 0 && name != "" {
fmt.Printf("Group %s: %s\n", name, match[name])
}
}
}
}
Run Code Online (Sandbox Code Playgroud)
输出:
--------------
Match 1:
Group ip: 57.157.87.86
Group localtime: 06/Feb/2020:00:11:04 +0100
Group request: parammore=1&customer_id=1&version=1.56¶m=meaningful&customer_name=somewebsite.de&some_id=4&cachebuster=1580944263903 HTTP/1.1
Group ref: https://www.somewebsite.com/more/andheresomemore/
Group agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:72.0) Gecko/20100101 Firefox/72.0
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
3966 次 |
| 最近记录: |