golang 中的正则表达式命名组

Jur*_*ocs 7 regex go

我需要将正则表达式与 golang 集成的帮助。我想解析日志文件并创建一个在https://regex101.com/r/p4mbiS/1/上看起来相当不错的正则表达式

日志行看起来像这样:

57.157.87.86 - - [06/Feb/2020:00:11:04 +0100] "GET /?parammore=1&customer_id=1&version=1.56&param=meaningful&customer_name=somewebsite.de&some_id=4&cachebuster=1580944263903 HTTP/1.1" 204 0 "https://www.somewebsite.com/more/andheresomemore/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:72.0) Gecko/20100101 Firefox/72.0"
Run Code Online (Sandbox Code Playgroud)

正则表达式是这样的:

(?P<ip>([^\s]+)).+?\[(?P<localtime>(.*?))\].+?GET\s\/\?(?P<request>.+?)\".+?\"(?P<ref>.+?)\".\"(?P<agent>.+?)\"
Run Code Online (Sandbox Code Playgroud)

命名组的结果应如下所示:

ip: 57.157.87.86

当地时间:2020年2月6日:00:11:04 +0100

请求:parammore=1&customer_id=1&...HTTP/1.1

参考: https: //www.somewebsite.com/more/andheresomemore/

代理:Mozilla/5.0(Windows NT 10.0;Win64;x64;rv:72.0)...

regex101.com 生成的 golang 代码对我不起作用。我尝试改进它但没有成功。

golang 代码只返回整个字符串而不是组。

package main

import (
    "regexp"
    "fmt"
)

func main() {
    var re = regexp.MustCompile(`(?P<ip>([^\s]+)).+?\[(?P<localtime>(.*?))\].+?GET\s\/\?(?P<request>.+?)\".+?\"(?P<ref>.+?)\".\"(?P<agent>.+?)\"`)
    var str = `57.157.87.86 - - [06/Feb/2020:00:11:04 +0100] "GET /?parammore=1&customer_id=1&version=1.56&param=meaningful&customer_name=somewebsite.de&some_id=4&cachebuster=1580944263903 HTTP/1.1" 204 0 "https://www.somewebsite.com/more/andheresomemore/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:72.0) Gecko/20100101 Firefox/72.0"`
    
    if len(re.FindStringIndex(str)) > 0 {
        fmt.Println(re.FindString(str),"found at index",re.FindStringIndex(str)[0])
    }
}
Run Code Online (Sandbox Code Playgroud)

在这里找到小提琴https://play.golang.org/p/e0_8PM-Nv6i

Wik*_*żew 6

单场比赛解决方案

由于您定义了捕获组并需要提取它们的值,因此您需要使用.FindStringSubmatch,请参阅此 Go lang 演示

package main

import (
    "fmt"
    "regexp"
)

func main() {
    var re = regexp.MustCompile(`(?P<ip>\S+).+?\[(?P<localtime>.*?)\].+?GET\s/\?(?P<request>.+?)".+?"(?P<ref>.+?)"\s*"(?P<agent>.+?)"`)
    var str = `57.157.87.86 - - [06/Feb/2020:00:11:04 +0100] "GET /?parammore=1&customer_id=1&version=1.56&param=meaningful&customer_name=somewebsite.de&some_id=4&cachebuster=1580944263903 HTTP/1.1" 204 0 "https://www.somewebsite.com/more/andheresomemore/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:72.0) Gecko/20100101 Firefox/72.0"`
    result := make(map[string]string) 
    match := re.FindStringSubmatch(str) 
    for i, name := range re.SubexpNames() {
        if i != 0 && name != "" {
            result[name] = match[i]
        }
    }
    fmt.Printf("IP: %s\nLocal Time: %s\nRequest: %s\nRef: %s\nAgent: %s\n",result["ip"], result["localtime"], result["request"], result["ref"], result["agent"])
}
Run Code Online (Sandbox Code Playgroud)

输出:

IP: 57.157.87.86
Local Time: 06/Feb/2020:00:11:04 +0100
Request: parammore=1&customer_id=1&version=1.56&param=meaningful&customer_name=somewebsite.de&some_id=4&cachebuster=1580944263903 HTTP/1.1
Ref: https://www.somewebsite.com/more/andheresomemore/
Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:72.0) Gecko/20100101 Firefox/72.0
Run Code Online (Sandbox Code Playgroud)

在模式中如此频繁地使用并不是一个好主意,.+?因为它会降低性能,因此我用否定字符类替换了这些点模式,并尝试使模式更详细一些。

多匹配解决方案

在这里,您需要使用regexp.FindAllStringSubmatch

请参阅此 Go 演示

package main

import (
    "fmt"
    "regexp"
)

func main() {
    var re = regexp.MustCompile(`(?P<ip>\S+).+?\[(?P<localtime>.*?)\].+?GET\s/\?(?P<request>.+?)".+?"(?P<ref>.+?)"\s*"(?P<agent>.+?)"`)
    var str = `57.157.87.86 - - [06/Feb/2020:00:11:04 +0100] "GET /?parammore=1&customer_id=1&version=1.56&param=meaningful&customer_name=somewebsite.de&some_id=4&cachebuster=1580944263903 HTTP/1.1" 204 0 "https://www.somewebsite.com/more/andheresomemore/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:72.0) Gecko/20100101 Firefox/72.0"`
    result := make([]map[string]string,0) 
    for _, match := range re.FindAllStringSubmatch(str, -1) {
        res := make(map[string]string)
        for i, name := range re.SubexpNames() {
            if i != 0 && name != "" {
                res[name] = match[i]
            }
        }
        result = append(result, res)
    }

    // Displaying the matches
    for i, match := range(result) {
        fmt.Printf("--------------\nMatch %d:\n", i+1)
        for i, name := range re.SubexpNames() {
            if i != 0 && name != "" {
                fmt.Printf("Group %s: %s\n", name, match[name]) 
            }
        }
    }
}
Run Code Online (Sandbox Code Playgroud)

输出:

--------------
Match 1:
Group ip: 57.157.87.86
Group localtime: 06/Feb/2020:00:11:04 +0100
Group request: parammore=1&customer_id=1&version=1.56&param=meaningful&customer_name=somewebsite.de&some_id=4&cachebuster=1580944263903 HTTP/1.1
Group ref: https://www.somewebsite.com/more/andheresomemore/
Group agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:72.0) Gecko/20100101 Firefox/72.0
Run Code Online (Sandbox Code Playgroud)