逐行阅读并逐行打印匹配

Question

逐行阅读并逐行打印匹配

Din*_*mar 6 linux bash shell grep text-processing

我是shell脚本的新手,如果我能从下面的问题中得到一些帮助,那就太棒了.

我想逐行读取文本文件,并将该行中所有匹配的模式打印到新文本文件中的一行.

例如:

$ cat input.txt

SYSTEM ERROR: EU-1C0A  Report error -- SYSTEM ERROR: TM-0401 DEFAULT Test error
SYSTEM ERROR: MG-7688 DEFAULT error -- SYSTEM ERROR: DN-0A00 Error while getting object -- ERROR: DN-0A52 DEFAULT Error -- ERROR: MG-3218 error occured in HSSL
SYSTEM ERROR: DN-0A00 Error while getting object -- ERROR: DN-0A52 DEFAULT Error
SYSTEM ERROR: EU-1C0A  error Failed to fill in test report -- ERROR: MG-7688

Run Code Online (Sandbox Code Playgroud)

预期输出如下:

$ cat output.txt

EU-1C0A TM-0401
MG-7688 DN-0A00 DN-0A52 MG-3218
DN-0A00 DN-0A52
EU-1C0A MG-7688

Run Code Online (Sandbox Code Playgroud)

我尝试了以下代码:

while read p; do
    grep -o '[A-Z]\{2\}-[A-Z0-9]\{4\}' | xargs
done < input.txt > output.txt

Run Code Online (Sandbox Code Playgroud)

产生了这个输出:

EU-1C0A TM-0401 MG-7688 DN-0A00 DN-0A52 MG-3218 DN-0A00 DN-0A52 EU-1C0A MG-7688 .......

Run Code Online (Sandbox Code Playgroud)

然后我也尝试了这个:

while read p; do
    grep -o '[A-Z]\{2\}-[A-Z0-9]\{4\}' | xargs > output.txt
done < input.txt

Run Code Online (Sandbox Code Playgroud)

但没有帮助:(

也许有另一种方式,我愿意接受awk/sed/cut或者其他...... :)

注意:可以有任意数量的错误代码(即XX:XXXX,单行中感兴趣的模式).

Answer 1

joe*_*epd 5

% awk 'BEGIN{RS=": "};NR>1{printf "%s%s", $1, ($0~/\n/)?"\n":" "}' input.txt 
EU-1C0A TM-0401
MG-7688 DN-0A00 DN-0A52 MG-3218
DN-0A00 DN-0A52
EU-1C0A MG-7688

Run Code Online (Sandbox Code Playgroud)

longform中的说明:

awk '
    BEGIN{ RS=": " } # Set the record separator to colon-space
    NR>1 {           # Ignore the first record
        printf("%s%s", # Print two strings:
            $1,      # 1. first field of the record (`$1`)
            ($0~/\n/) ? "\n" : " ")
                     # Ternary expression, read as `if condition (thing
                     # between brackets), then thing after `?`, otherwise
                     # thing after `:`.
                     # So: If the record ($0) matches (`~`) newline (`\n`),
                     # then put a newline. Otherwise, put a space.
    }
' input.txt

Run Code Online (Sandbox Code Playgroud)

以前回答未经修改的问题:

% awk 'BEGIN{RS=": "};NR>1{printf "%s%s", $1, (NR%2==1)?"\n":" "}' input.txt 
EU-1C0A TM-0401
MG-7688 MG-3218
DN-0A00 DN-0A52
EU-1C0A MG-7688

Run Code Online (Sandbox Code Playgroud)

编辑:防止:注射(thx @ e0k).测试记录分隔符后面的第一个字段看起来像我们预期的那样.

awk 'BEGIN{RS=": "};NR>1 && $1 ~ /^[A-Z]{2}-[A-Z0-9]{4}$/ {printf "%s%s", $1, ($0~/\n/)?"\n":" "}' input.txt

Run Code Online (Sandbox Code Playgroud)

这个解决方案利用了示例中每个错误代码的前面是`:`.如果由于任何其他原因出现此字符串`:`,它将打印除错误代码之外的其他内容(误报).没有尝试将错误代码与正则表达式匹配. (4认同)

Answer 2

Sta*_*224 4

总有 Perl！这将抓取每行任意数量的匹配项。

perl -nle '@matches = /[A-Z]{2}-[A-Z0-9]{4}/g; print(join(" ", @matches)) if (scalar @matches);' output.txt

Run Code Online (Sandbox Code Playgroud)

-eperl 代码由编译器运行， -n一次运行一行，并 -l自动截断该行并在打印中添加换行符。

正则表达式隐式匹配$_. 所以@matches = $_ =~ //g过于冗长。

如果没有匹配，则不会打印任何内容。

归档时间：	9 年，2 月前
查看次数：	1528 次
最近记录：	7 年，9 月前