unix 删除重复的行块,先保留

1 unix bash shell awk duplicates

这是我 TWS 数据库中工作的摘录,我的块以:

/^ES2BVE1011 # EM5341CAI000 (jobname)
Run Code Online (Sandbox Code Playgroud)

并以:

/^ RECOVERY (can be STOP ou CONTINUE) 
Run Code Online (Sandbox Code Playgroud)

我有重复的块,我只想保留第一个以最大限度地减少加载时间,前提是整个块都具有相同的行,因为它可以是相同的作业名称,但块中的其他行可能存在差异:

ES2BVE1011 # EM5341CAI000  
 SCRIPTNAME "/s2ipgm/scripts/current/em5341cai000.sh -scai -eexp"  
 STREAMLOGON us2icai  
 DESCRIPTION "balance sheet errors"  
 UNIX TASKTYPE  
 SUCCOUTPUTCOND CONDSUCC "(RC = 0)"  
 RECOVERY STOP  
ES2BVE1011 # ED5237CAI001  
 SCRIPTNAME "/s2ipgm/scripts/current/ed5237com001.sh -scai -eexp"  
 STREAMLOGON us2icai  
 DESCRIPTION "bb / ir account list"  
 UNIX TASKTYPE  
 SUCCOUTPUTCOND CONDSUCC "(RC = 0)"  
 RECOVERY STOP  
ES2BVE1011 # CA4305CAI000  
 SCRIPTNAME "/s2ipgm/scripts/current/ea4305com000.sh -scai -ecpt"  
 STREAMLOGON us2icai  
 DESCRIPTION "list op. Fid."  
 UNIX TASKTYPE  
 SUCCOUTPUTCOND CONDSUCC "(RC = 0)"  
 RECOVERY STOP  
ES2BVE1011 # CM4622CAI000  
 SCRIPTNAME "/s2ipgm/scripts/current/em4622com000.sh -scai -ecpt"  
 STREAMLOGON us2icai  
 DESCRIPTION "list of debits covered / not c"  
 UNIX TASKTYPE  
 SUCCOUTPUTCOND CONDSUCC "(RC = 0)"  
 RECOVERY STOP  
ES2BVE1011 # ED5237CAI001  
 SCRIPTNAME "/s2ipgm/scripts/current/ed5237com001.sh -scai -eexp"  
 STREAMLOGON us2icai  
 DESCRIPTION "bb / ir account list"  
 UNIX TASKTYPE  
 SUCCOUTPUTCOND CONDSUCC "(RC = 0)"  
 RECOVERY STOP  
ES2BVE1011 # CJ5326CAI000  
 SCRIPTNAME "/s2ipgm/scripts/current/ej5326cai000.sh -scai -ecpt"  
 STREAMLOGON us2icai  
 DESCRIPTION "daily report"  
 UNIX TASKTYPE  
 SUCCOUTPUTCOND CONDSUCC "(RC = 0)"  
 RECOVERY STOP  
ES2BVE1011 # CA4305CAI000  
 SCRIPTNAME "/s2ipgm/scripts/current/ea4305com000.sh -scai -ecpt"  
 STREAMLOGON us2icai  
 DESCRIPTION "list op. Fid."  
 UNIX TASKTYPE  
 SUCCOUTPUTCOND CONDSUCC "(RC = 0)"  
 RECOVERY STOP  
ES2BVE1011 # ED5237CAI001  
 SCRIPTNAME "/usr/bin/true"  
 STREAMLOGON us2ipgm  
 DESCRIPTION "bb / ir account list"  
 UNIX TASKTYPE  
 SUCCOUTPUTCOND CONDSUCC "(RC = 0)"  
 RECOVERY STOP
Run Code Online (Sandbox Code Playgroud)

Ed *_*ton 6

$ cat tst.awk
{ block = block $0 ORS }
/^ RECOVERY/ {
    if ( !seen[block]++ ) {
        printf "%s", block
    }
    block = ""
}
Run Code Online (Sandbox Code Playgroud)

.

$ awk -f tst.awk file
ES2BVE1011 # EM5341CAI000
 SCRIPTNAME "/s2ipgm/scripts/current/em5341cai000.sh -scai -eexp"
 STREAMLOGON us2icai
 DESCRIPTION "balance sheet errors"
 UNIX TASKTYPE
 SUCCOUTPUTCOND CONDSUCC "(RC = 0)"
 RECOVERY STOP
ES2BVE1011 # ED5237CAI001
 SCRIPTNAME "/s2ipgm/scripts/current/ed5237com001.sh -scai -eexp"
 STREAMLOGON us2icai
 DESCRIPTION "bb / ir account list"
 UNIX TASKTYPE
 SUCCOUTPUTCOND CONDSUCC "(RC = 0)"
 RECOVERY STOP
ES2BVE1011 # CA4305CAI000
 SCRIPTNAME "/s2ipgm/scripts/current/ea4305com000.sh -scai -ecpt"
 STREAMLOGON us2icai
 DESCRIPTION "list op. Fid."
 UNIX TASKTYPE
 SUCCOUTPUTCOND CONDSUCC "(RC = 0)"
 RECOVERY STOP
ES2BVE1011 # CM4622CAI000
 SCRIPTNAME "/s2ipgm/scripts/current/em4622com000.sh -scai -ecpt"
 STREAMLOGON us2icai
 DESCRIPTION "list of debits covered / not c"
 UNIX TASKTYPE
 SUCCOUTPUTCOND CONDSUCC "(RC = 0)"
 RECOVERY STOP
ES2BVE1011 # CJ5326CAI000
 SCRIPTNAME "/s2ipgm/scripts/current/ej5326cai000.sh -scai -ecpt"
 STREAMLOGON us2icai
 DESCRIPTION "daily report"
 UNIX TASKTYPE
 SUCCOUTPUTCOND CONDSUCC "(RC = 0)"
 RECOVERY STOP
ES2BVE1011 # ED5237CAI001
 SCRIPTNAME "/usr/bin/true"
 STREAMLOGON us2ipgm
 DESCRIPTION "bb / ir account list"
 UNIX TASKTYPE
 SUCCOUTPUTCOND CONDSUCC "(RC = 0)"
 RECOVERY STOP
Run Code Online (Sandbox Code Playgroud)

  • @thanasisp 是的,这就是解决办法,谢谢,我现在更新了我的答案。这就是我应该知道只回答包含预期输出的问题的原因之一 - 因此可以通过比较工具输出来查看它是否有效,而不是必须盯着它并猜测它是否正确。我重新吸取教训了! (2认同)