文本处理恰当地输出文件

Arr*_*cal 5 sed awk text-processing

我有一个由存储库管理工具的输出制成的文本文件aptly,其中列出了我发布的存储库,我需要从中提取信息。

文件格式如下:

Published repositories:
 * test_repo_one/xenial [i386,amd64] publishes {main: [xenial-main_20190311]: Snapshot from mirror [xenial-main]: http//gb.archive.ubuntu.com/ubuntu/ xenial [src]}, {multiverse: [xenial-multiverse_20190311]: Snapshot from mirror [xenial-multiverse]: http//gb.archive.ubuntu.com/ubuntu/ xenial [src]}, {restricted: [xenial-restricted_20190311]: Snapshot from mirror [xenial-restricted]: http//gb.archive.ubuntu.com/ubuntu/ xenial [src]}, {universe: [xenial-universe_20190311]: Snapshot from mirror [xenial-universe]: http//gb.archive.ubuntu.com/ubuntu/ xenial [src]}
 * test_repo_one/xenial-security [i386,amd64] publishes {main: [xenial-security-main_20190311]: Snapshot from mirror [xenial-security-main]: http//gb.archive.ubuntu.com/ubuntu/ xenial-security[src]}, {multiverse: [xenial-security-multiverse_20190311]: Snapshot from mirror [xenial-security-multiverse]: http//gb.archive.ubuntu.com/ubuntu/ xenial-security[src]}, {restricted: [xenial-security-restricted_20190311]: Snapshot from mirror [xenial-security-restricted]: http//gb.archive.ubuntu.com/ubuntu/ xenial-security[src]}, {universe: [xenial-security-universe_20190311]: Snapshot from mirror [xenial-security-universe]: http//gb.archive.ubuntu.com/ubuntu/ xenial-security[src]}
 * test_repo_two/trusty [i386,amd64] publishes {main: [trusty-main_20190312]: Snapshot from mirror [trusty-main]: http//gb.archive.ubuntu.com/ubuntu/ trusty[src]}, {multiverse: [trusty-multiverse_20190312]: Snapshot from mirror [trusty-multiverse]: http//gb.archive.ubuntu.com/ubuntu/ trusty[src]}, {restricted: [trusty-restricted_20190312]: Snapshot from mirror [trusty-restricted]: http//gb.archive.ubuntu.com/ubuntu/ trusty[src]}, {universe: [trusty-universe_20190312]: Snapshot from mirror [trusty-universe]: http//gb.archive.ubuntu.com/ubuntu/ trusty[src]}
...
Run Code Online (Sandbox Code Playgroud)

输出的最后一行以新行结束。

“已发布的存储库:”行不是必需的。

对于以“*”开头的每一行,我需要删除无关信息,只留下快照名称。中没有办法做到这一点aptly。这些行中的第一行所需的输出是。

test_repo_one/xenial [xenial-main_20190311] [xenial-multiverse_20190311] [xenial-restricted_20190311] [xenial-universe_20190311]
Run Code Online (Sandbox Code Playgroud)

方括号也不是必需的,因此保留或删除这些的解决方案很好。我更喜欢sedorawk解决方案,但任何有效的方法都会受到高度赞赏。

ter*_*don 3

Perl 方法:

$ perl -lne 'next unless /^\s*\*\s*(\S+)/; $n=$1; @k=(/\{.+?:\s*\[(.+?)\]/g); print "$n @k"' file 
test_repo_one/xenial xenial-main_20190311 xenial-multiverse_20190311 xenial-restricted_20190311 xenial-universe_20190311
test_repo_one/xenial-security xenial-security-main_20190311 xenial-security-multiverse_20190311 xenial-security-restricted_20190311 xenial-security-universe_20190311
test_repo_two/trusty trusty-main_20190312 trusty-multiverse_20190312 trusty-restricted_20190312 trusty-universe_20190312
Run Code Online (Sandbox Code Playgroud)

解释

  • perl -lne:逐行读取输入文件 ( -n),删除尾随换行符 ( -l) 并在每行上运行由 给出的脚本-e。它还-l为每个调用添加换行符print
  • next unless /^\s*\*\s*(\S+)/;\S+:找到存储库的名称,因此在以 0 个或多个空白字符 ( ) 开头的行上的第一段非空白字符 ( ^\s*),然后是*( \*),然后再次是 0 个或多个空白字符。之后最长的非空白区域就是我们想要的。如果此行与此正则表达式不匹配,则会next将我们移至下一行。
  • $n=$1:将上面的匹配捕获的内容((\S+)括号中的,$1)保存为$n.
  • @k=(/\{.+?:\s*\[(.+?)\]/g):查找所有有 a {、任何其他字符、然后是 a :、后跟空格和 a 的情况[,并捕获 和 之间的任何[内容]。将所有匹配的字符串保存在数组中@k
  • print "$n @k":最后,打印上面的存储库名称、$n和数组。@k

如果您希望包含方括号,可以使用:

$ perl -lne 'next unless /^\s*\*\s*(\S+)/; $n=$1; @k=(/\{.+?:\s*(\[.+?\])/g); print "$n @k"' file 
test_repo_one/xenial [xenial-main_20190311] [xenial-multiverse_20190311] [xenial-restricted_20190311] [xenial-universe_20190311]
test_repo_one/xenial-security [xenial-security-main_20190311] [xenial-security-multiverse_20190311] [xenial-security-restricted_20190311] [xenial-security-universe_20190311]
test_repo_two/trusty [trusty-main_20190312] [trusty-multiverse_20190312] [trusty-restricted_20190312] [trusty-universe_20190312]
Run Code Online (Sandbox Code Playgroud)