GREP / SED 或 AWK：在模式匹配时打印文件中的整个段落

Question

GREP / SED 或 AWK：在模式匹配时打印文件中的整个段落

我有一个包含数百个段落的文件，每个段落大约 15 行。我需要搜索一个模式，比如Occurrence: 1。如果在段落中找到这种模式，我需要打印整个段落。请注意，段落由 2 个新行字符分隔。

我已经尝试了下面的代码行，这显然打印了文件中的第一次出现。我不知何故无法使用循环并打印所有此类事件。

sed -n '1,/Occurrence: 1/p' ystdef.txt | tail -9 > ystalarm.txt

Run Code Online (Sandbox Code Playgroud)

我可以使用g（全局）标志sed来完成这项工作吗？如果是，如何？

请注意，我知道这些grep -A/B/C命令，但它们在我的 Cygwin 终端上不起作用。

Answer 1

Gil*_*il' 10

您可以使用 awk 的“段落模式”，其中输入记录由至少两个换行符的序列分隔。

awk -v RS= '/Occurance: 1/' ystdef.txt

Run Code Online (Sandbox Code Playgroud)

请注意，段落将全部折叠在一起打印（在它们的内容之间有一个换行符）。awk 不允许您将输出分隔符与输入分隔符匹配（除了一些 GNU awk 扩展），但您可以轻松地将段落分隔符标准化为两个换行符。

awk -v RS= ORS='\n\n' '/Occurance: 1/' ystdef.txt

Run Code Online (Sandbox Code Playgroud)

如果您不想在末尾添加额外的换行符：

awk -v RS= '/Occurance: 1/ {if (not_first) print ""; print; not_first=1}' ystdef.txt

Run Code Online (Sandbox Code Playgroud)

Answer 2

mik*_*erv 8

这是在 GNU 中sed：

sed '/./{H;$!d};x;/SEARCH/!d'

Run Code Online (Sandbox Code Playgroud)

便携式/POSIX 语法：

sed -e '/./{H;$!d;}' -e 'x;/SEARCH/!d'

Run Code Online (Sandbox Code Playgroud)

如果一行包含一个或多个字符会被添加到H旧空间，如果它是!在$最后一行将被删除。这意味着每一行不是空白的行都会被存储并从输出中删除。

因此，如果未 d删除一行，则sedex更改保持和模式空间的内容。这使得保持空间只是一个空行，而模式空间是自最后一个空行以来的所有行。

sed然后解决模式/SEARCH/。如果!未找到，则 d删除模式空间而不打印，否则默认打印段落。

这是一个 shell 函数，您的问题作为输入：

注意 - 处理后的数据在本网站代码突出显示时为了可读性在下面进行了注释。它会按原样或没有哈希值工作。

_pgraph() { 
    sed '/./{H;$!d};x;/'"$1"'/!d'
} <<\DATA
#    I have a file with hundreds of paragraphs of
#    around 15 lines each. I need to search for a
#    pattern, say Occurance: 1. If this pattern is
#    found in the para, I need to print the entire
#    paragraph. Note that the paragraps are seperared
#    by 2 new line characters.

#    I have tried the below line of code and this
#    obviously prints the first occurence in the
#    file. I am somehow unable to use a loop and
#    print all such occurances.

#    sed -n '1,/Occurance: 1/p' ystdef.txt | tail -9 >
#    ystalarm.txt Can I use the g (global) flag with
#    sed to make this work? If yes, how?

#    Note that I am aware of the grep -A/B/C commands
#    but they wont work on my cygwin terminal.
DATA

Run Code Online (Sandbox Code Playgroud)

现在我可以这样做：

_pgraph Note

###OUTPUT

#    I have a file with hundreds of paragraphs of
#    around 15 lines each. I need to search for a
#    pattern, say Occurance: 1. If this pattern is
#    found in the para, I need to print the entire
#    paragraph. Note that the paragraps are seperared
#    by 2 new line characters.

#    Note that I am aware of the grep -A/B/C commands
#    but they wont work on my cygwin terminal.

Run Code Online (Sandbox Code Playgroud)

或者更具体地说：

_pgraph 'Note that I'

#    Note that I am aware of the grep -A/B/C commands
#    but they wont work on my cygwin terminal.

Run Code Online (Sandbox Code Playgroud)

你可以做任何文件同样没有通过简单地从消除一切附加文字输入函数本身<<\DATA来DATA在函数定义和喜欢跑步吧：

_pgraph 'PATTERN' </path/to/input.file

Run Code Online (Sandbox Code Playgroud)

Answer 3

cho*_*oba 4

您可以在 Perl 中使用“段落模式”：

perl -ne 'BEGIN{ $/ = "" } print if /pattern/' input

Run Code Online (Sandbox Code Playgroud)

+1 或者，稍微容易一些的眼睛，`perl -n00e 'print if /pattern/' input` (4认同)

归档时间：	11 年，11 月前
查看次数：	19457 次
最近记录：	7 年，12 月前