Grep 匹配和提取

Question

Grep 匹配和提取

我有一个包含行的文件

proto=tcp/http  sent=144        rcvd=52 spkt=3 
proto=tcp/https  sent=145        rcvd=52 spkt=3
proto=udp/dns  sent=144        rcvd=52 spkt=3

Run Code Online (Sandbox Code Playgroud)

我需要提取 proto 的值，即tcp/http, tcp/https, udp/dns。

到目前为止，我已经尝试过这个，grep -o 'proto=[^/]*/'但只能将值提取为proto=tcp/.

Answer 1

Kus*_*nda 13

使用grep -o，您必须完全匹配您要提取的内容。由于您不想提取proto=字符串，因此不应匹配它。

将匹配tcp或udp后跟斜杠和一些非空字母数字字符串的扩展正则表达式是

(tcp|udp)/[[:alnum:]]+

Run Code Online (Sandbox Code Playgroud)

将此应用于您的数据：

$ grep -E -o '(tcp|udp)/[[:alnum:]]+' file
tcp/http
tcp/https
udp/dns

Run Code Online (Sandbox Code Playgroud)

为了确保我们只在以字符串开头的行上执行此操作proto=：

grep '^proto=' file | grep -E -o '(tcp|udp)/[[:alnum:]]+'

Run Code Online (Sandbox Code Playgroud)

使用sed，删除第=一个空白字符之前和之后的所有内容：

$ sed 's/^[^=]*=//; s/[[:blank:]].*//' file
tcp/http
tcp/https
udp/dns

Run Code Online (Sandbox Code Playgroud)

为了确保我们只在以 string 开头的行上执行此操作proto=，您可以插入与上述相同的预处理步骤grep，或者您可以使用

sed -n '/^proto=/{ s/^[^=]*=//; s/[[:blank:]].*//; p; }' file

Run Code Online (Sandbox Code Playgroud)

在这里，我们使用-n选项抑制默认输出，然后仅当该行匹配时才触发替换和该行的显式打印^proto=。

用awk，使用默认字段分隔符，然后分裂的第一个字段上=和打印它的第二位：

$ awk '{ split($1, a, "="); print a[2] }' file
tcp/http
tcp/https
udp/dns

Run Code Online (Sandbox Code Playgroud)

为了确保我们只在以 string 开头的行上执行此操作proto=，您可以插入与上述相同的预处理步骤grep，或者您可以使用

awk '/^proto=/ { split($1, a, "="); print a[2] }' file

Run Code Online (Sandbox Code Playgroud)

Answer 2

use*_*001 10

如果您使用的是 GNU grep（对于-P选项），您可以使用：

$ grep -oP 'proto=\K[^ ]*' file
tcp/http
tcp/https
udp/dns

Run Code Online (Sandbox Code Playgroud)

在这里，我们匹配proto=字符串，以确保我们正在提取正确的列，但随后我们将其从带有\K标志的输出中丢弃。

以上假设列是空格分隔的。如果制表符也是有效的分隔符，您将使用\S匹配非空白字符，因此命令将是：

grep -oP 'proto=\K\S*' file

Run Code Online (Sandbox Code Playgroud)

如果您还想防止匹配字段proto=是子字符串，例如 a thisisnotaproto=tcp/https，您可以\b像这样添加单词边界：

grep -oP '\bproto=\K\S*' file

Run Code Online (Sandbox Code Playgroud)

Answer 3

jes*_*e_b 6

使用awk：

awk '$1 ~ "proto" { sub(/proto=/, ""); print $1 }' input

Run Code Online (Sandbox Code Playgroud)

$1 ~ "proto"将确保我们只proto对第一列中的行采取行动

sub(/proto=/, "")将从proto=输入中删除

print $1 打印剩余的列

$ awk '$1 ~ "proto" { sub(/proto=/, ""); print $1 }' input
tcp/http
tcp/https
udp/dns

Run Code Online (Sandbox Code Playgroud)

Answer 4

Ed *_*ton 1

假设这与您之前的问题有关，那么您就走错了路。与其尝试将一些脚本拼凑在一起，这些脚本大部分时间都会做您想要做的事情，并且每次您需要做任何稍微不同的事情时都需要获得一个完全不同的脚本，只需创建 1 个可以解析您的脚本将输入文件放入一个数组（f[]如下）中，该数组将您的字段名称（标签）映射到它们的值，然后您可以对结果执行任何您想要的操作，例如，给定上一个问题中的输入文件：

$ cat file
Feb             3       0:18:51 17.1.1.1                      id=firewall     sn=qasasdasd "time=""2018-02-03"     22:47:55        "UTC""" fw=111.111.111.111       pri=6    c=2644        m=88    "msg=""Connection"      "Opened"""      app=2   n=2437       src=12.1.1.11:49894:X0       dst=4.2.2.2:53:X1       dstMac=42:16:1b:af:8e:e1        proto=udp/dns   sent=83 "rule=""5"      "(LAN->WAN)"""

Run Code Online (Sandbox Code Playgroud)

我们可以编写一个 awk 脚本来创建一个按名称/标签索引的值数组：

$ cat tst.awk
{
    f["hdDate"] = $1 " " $2
    f["hdTime"] = $3
    f["hdIp"]   = $4
    sub(/^([^[:space:]]+[[:space:]]+){4}/,"")

    while ( match($0,/[^[:space:]]+="?/) ) {
        if ( tag != "" ) {
            val = substr($0,1,RSTART-1)
            gsub(/^[[:space:]]+|("")?[[:space:]]*$/,"",val)
            f[tag] = val
        }

        tag = substr($0,RSTART,RLENGTH-1)
        gsub(/^"|="?$/,"",tag)

        $0 = substr($0,RSTART+RLENGTH)
    }

    val = $0
    gsub(/^[[:space:]]+|("")?[[:space:]]*$/,"",val)
    f[tag] = val
}

Run Code Online (Sandbox Code Playgroud)

鉴于您可以对数据执行任何您喜欢的操作，只需通过字段名称引用它，例如使用 GNU awk 以便于将-e文件中的脚本与命令行脚本混合：

$ awk -f tst.awk -e '{for (tag in f) printf "f[%s]=%s\n", tag, f[tag]}' file
f[fw]=111.111.111.111
f[dst]=4.2.2.2:53:X1
f[sn]=qasasdasd
f[hdTime]=0:18:51
f[sent]=83
f[m]=88
f[hdDate]=Feb 3
f[n]=2437
f[app]=2
f[hdIp]=17.1.1.1
f[src]=12.1.1.11:49894:X0
f[c]=2644
f[dstMac]=42:16:1b:af:8e:e1
f[msg]="Connection"      "Opened"
f[rule]="5"      "(LAN->WAN)"
f[proto]=udp/dns
f[id]=firewall
f[time]="2018-02-03"     22:47:55        "UTC"
f[pri]=6

$ awk -f tst.awk -e '{print f["proto"]}' file
udp/dns

$ awk -f tst.awk -e 'f["proto"] ~ /udp/ {print f["sent"], f["src"]}' file
83 12.1.1.11:49894:X0

Run Code Online (Sandbox Code Playgroud)

这太棒了，非常感谢:) (2认同)

归档时间：	6 年，5 月前
查看次数：	2485 次
最近记录：	6 年，5 月前