内联删除括号之间的重复单词

Chr*_*ris 2 shell text-processing regular-expression

我们的输入看起来像

2012-04-17  [GBPGBP]
2012-04-13  [GBP GBP]
2012-04-13  [GBP]
2012-04-11  [GBPGBP]
2012-04-11  [GBP GBP]
2012-04-10  [GBPGBP]
2012-04-06  [GBP GBP GBP]
2012-04-17  [GBPGBP]
2012-04-13  [GBP CDN]
2012-04-13  [GBP]
2012-04-11  [GBPCDN]
2012-04-11  [GBP DL DL]
2012-04-10  [PSGBP]
2012-04-06  [PS PS]
Run Code Online (Sandbox Code Playgroud)

我们希望得到这样的输出

2012-04-17  [GBP]
2012-04-13  [GBP]
2012-04-13  [GBP]
2012-04-11  [GBP]
2012-04-11  [GBP]
2012-04-10  [GBP]
2012-04-06  [GBP]
2012-04-17  [GBP]
2012-04-13  [GBP CDN]
2012-04-13  [GBP]
2012-04-11  [GBPCDN]
2012-04-11  [GBP DL]
2012-04-10  [PSGBP]
2012-04-06  [PS]
Run Code Online (Sandbox Code Playgroud)

基本上删除括号内的任何重复字符串。有什么建议?

Gil*_*il' 5

sed -e ': a' -e 's/\(\[[^][]*\)\([A-Z][A-Z][A-Z]*\)\([^][]*\)\2/\1\2\3/' -e 't a'
Run Code Online (Sandbox Code Playgroud)
  • : a 在脚本的开头设置一个标签。
  • s/\(wibble\)\(foo\)\(bar\)\2/\1\2\3/ 用 wibblefoobar 替换 wibblefoobarfoo。
  • [A-Z][A-Z][A-Z]* 匹配两个或更多字母
  • t aa如果前一个s命令进行了替换,则循环回到标签。