awk和多行匹配(sub-regex)

Question

我试图使用awk来解析多线表达式.其中一个看起来像这样:

_begin  hello world !
_attrib0    123
_attrib1    super duper
_attrib1    yet another value
_attrib2    foo
_end

我需要提取与_begin和_attrib1相关的值.所以在这个例子中,awk脚本应该返回(每行一个):

hello world ! super duper yet another value

使用的分隔符是制表符(\ t)字符.空格仅在字符串中使用.

Answer 1

以下awk脚本完成了这项工作:

#!/usr/bin/awk -f
BEGIN { FS="\t"; }
/^_begin/      { output=$2; }
$1=="_attrib1" { output=output " " $2; }
/^_end/        { print output; }

您没有指定是否要将tab(\t)作为输出字段分隔符.如果你这样做,请告诉我,我会更新答案.(或者你可以;它是微不足道的.)

当然,如果你想要一个可怕的替代方案(因为我们接近Hallowe'en),这里的解决方案使用sed:

$ sed -ne '/^_begin./{s///;h;};/^_attrib1[^0-9]/{s///;H;x;s/\n/ /;x;};/^_end/{;g;p;}' input.txt 
hello world ! super duper yet another value

这是如何运作的？Mwaahahaa,我很高兴你问.

/^_begin./{s///;h;};- 当我们看到时_begin,将其剥离并将其余部分存储到sed的"保持缓冲区".
/^_attrib1[^0-9]/{s///;H;x;s/\n/ /;x;};- 当我们看到时_attrib1,将其剥离,将其附加到保持缓冲区,交换保持缓冲区和模式空间,用空格替换换行符,然后再次交换保持缓冲区和模式空间.
/^_end/{;g;p;} - 我们已经到了最后,所以将保持缓冲区拉入模式空间并打印出来.

这假设您的输入字段分隔符只是一个选项卡.

很简单.曾经说过的sed是奥术？!