在不同行上的两个字符串之间提取文本

Question

在不同行上的两个字符串之间提取文本

我有一个包含以下随机主机的大型电子邮件文件:

......
HOSTS: test-host,host2.domain.com,
host3.domain.com,another-testing-host,host.domain.
com,host.anotherdomain.net,host2.anotherdomain.net,
another-local-host, TEST-HOST

DATE: August 11 2015 9:00
.......

Run Code Online (Sandbox Code Playgroud)

主机总是用逗号分隔,但它们可以分成一行,两行或多行(我无法控制它,不幸的是,它是电子邮件客户端所做的).

所以我需要提取字符串"HOSTS:"和字符串"DATE:"之间的所有文本,将其包装起来,并用新行替换逗号,如下所示:

test-host
host2.domain.com
host3.domain.com
another-testing-host
host.domain.com
host.anotherdomain.net
host2.anotherdomain.net
another-local-host
TEST-HOST

Run Code Online (Sandbox Code Playgroud)

到目前为止,我想出了这个,但是我失去了与"HOSTS"在同一条线上的所有东西:

sed '/HOST/,/DATE/!d;//d' ${file} | tr -d '\n' | sed -E "s/,\s*/\n/g"

Run Code Online (Sandbox Code Playgroud)

Answer 1

and*_*lrc 7

这样的事可能适合你:

sed -n '/HOSTS:/{:a;N;/DATE/!ba;s/[[:space:]]//g;s/,/\n/g;s/.*HOSTS:\|DATE.*//g;p}' "$file"

Run Code Online (Sandbox Code Playgroud)

分解:

-n                       # Disable printing
/HOSTS:/ {               # Match line containing literal HOSTS:
  :a;                    # Label used for branching (goto)
  N;                     # Added next line to pattern space
  /DATE/!ba              # As long as literal DATE is not matched goto :a
  s/.*HOSTS:\|DATE.*//g; # Remove everything in front of and including literal HOSTS:
                         # and remove everything behind and including literal DATE 
  s/[[:space:]]//g;      # Replace spaces and newlines with nothing
  s/,/\n/g;              # Replace comma with newline
  p                      # Print pattern space
}

Run Code Online (Sandbox Code Playgroud)

归档时间：	9 年，4 月前
查看次数：	3397 次
最近记录：	9 年，4 月前