AWK打印两个模式之间的行,只需要打印最后一次出现的匹配行

Ama*_*esh 6 awk text-processing

我想过滤日志文件并打印两个匹配之间的一些行并仅打印最后一个匹配项。

示例文件内容:

2023-03-08 11:12:44,306 - Code Deploy - INFO - Received signal
2023-03-08 11:12:44,306 - Code Deploy - INFO - Received message signal
2023-03-08 11:12:44,306 - Code Deploy - INFO - Branch is Testing
2023-03-08 11:12:44,307 - Code Deploy - INFO - Deployment started
2023-03-08 11:13:31,782 - Code Deploy - INFO - Old version2_0_5_12
2023-03-08 11:13:31,783 - Code Deploy - INFO - New version2_0_5_13
2023-03-08 11:13:32,553 - Code Deploy - INFO - Permission fixed
2023-03-08 11:13:32,554 - Code Deploy - INFO - Deployment finished
2023-03-08 11:13:34,900 - Code Deploy - ERROR - !!!!!!!!!! EXCEPTION !!!!!!!!!(535, b'5.7.8     Username and Password not accepted. Learn more at\n5.7.8  https://support.google.com/mail/?p=BadCredentials z16-20020a170903019000b0019a97a4324dsm9818181plg.5 - gsmtp')Traceback (most recent call last):
File "/root/code-dployment/server/deploy.py", line 94, in send_email
server.login(gmail_user, gmail_password)
File "/usr/lib/python3.5/smtplib.py", line 729, in login
raise last_exception
File "/usr/lib/python3.5/smtplib.py", line 720, in login
initial_response_ok=initial_response_ok)
File "/usr/lib/python3.5/smtplib.py", line 641, in auth
raise SMTPAuthenticationError(code, resp)
smtplib.SMTPAuthenticationError: (535, b'5.7.8 Username and Password not accepted. Learn more at\n5.7.8  https://support.google.com/mail/?p=BadCredentials z16-20020a170903019000b0019a97a4324dsm9818181plg.5 - gsmtp')

2023-03-09 11:52:57,194 - Code Deploy - INFO - Received signal
2023-03-09 11:52:57,194 - Code Deploy - INFO - Received message signal
2023-03-09 11:52:57,194 - Code Deploy - INFO - Branch is Testing
2023-03-09 11:52:57,195 - Code Deploy - INFO - Deployment started
2023-03-09 11:53:58,246 - Code Deploy - INFO - Old version2_0_5_13
2023-03-09 11:53:58,246 - Code Deploy - INFO - New version2_0_5_14
2023-03-09 11:53:58,498 - Code Deploy - INFO - Permission fixed
2023-03-09 11:53:58,498 - Code Deploy - INFO - Deployment finished
2023-03-09 11:54:00,797 - Code Deploy - ERROR - !!!!!!!!!! EXCEPTION !!!!!!!!!(535, b'5.7.8 Username and Password not accepted. Learn more at\n5.7.8  https://support.google.com/mail/?p=BadCredentials k17-20020aa790d1000000b005907716bf8bsm11097506pfk.60 - gsmtp')Traceback (most recent call last):
File "/root/code-dployment/server/deploy.py", line 94, in send_email
server.login(gmail_user, gmail_password)
File "/usr/lib/python3.5/smtplib.py", line 729, in login
raise last_exception
File "/usr/lib/python3.5/smtplib.py", line 720, in login
initial_response_ok=initial_response_ok)
File "/usr/lib/python3.5/smtplib.py", line 641, in auth
raise SMTPAuthenticationError(code, resp)
smtplib.SMTPAuthenticationError: (535, b'5.7.8 Username and Password not accepted. Learn more at\n5.7.8  https://support.google.com/mail/?p=BadCredentials k17-20020aa790d1000000b005907716bf8bsm11097506pfk.60 - gsmtp')
Run Code Online (Sandbox Code Playgroud)

它需要获取两个模式之间的内容。

Pattern1 = '接收到的信号'

Pattern2 = '部署完成'

预期结果:

2023-03-09 11:52:57,194 - Code Deploy - INFO - Received signal
2023-03-09 11:52:57,194 - Code Deploy - INFO - Received message signal
2023-03-09 11:52:57,194 - Code Deploy - INFO - Branch is Testing
2023-03-09 11:52:57,195 - Code Deploy - INFO - Deployment started
2023-03-09 11:53:58,246 - Code Deploy - INFO - Old version2_0_5_13
2023-03-09 11:53:58,246 - Code Deploy - INFO - New version2_0_5_14
2023-03-09 11:53:58,498 - Code Deploy - INFO - Permission fixed
2023-03-09 11:53:58,498 - Code Deploy - INFO - Deployment finished
Run Code Online (Sandbox Code Playgroud)

我想要一个 AWK 命令来在 bash 脚本中使用它。我找到了使用以下命令来过滤两种模式之间的内容的解决方案:

# awk '/Received signal/,/Deployment finished/' /tmp/result.log
Run Code Online (Sandbox Code Playgroud)

它将打印所有出现的整个匹配行,但是,我需要对其进行过滤,以便它只打印匹配模式的最后一次出现。

上述命令的输出是:

2023-03-08 11:12:44,306 - Code Deploy - INFO - Received signal
2023-03-08 11:12:44,306 - Code Deploy - INFO - Received message signal
2023-03-08 11:12:44,306 - Code Deploy - INFO - Branch is Testing
2023-03-08 11:12:44,307 - Code Deploy - INFO - Deployment started
2023-03-08 11:13:31,782 - Code Deploy - INFO - Old version2_0_5_12
2023-03-08 11:13:31,783 - Code Deploy - INFO - New version2_0_5_13
2023-03-08 11:13:32,553 - Code Deploy - INFO - Permission fixed
2023-03-08 11:13:32,554 - Code Deploy - INFO - Deployment finished
2023-03-09 11:52:57,194 - Code Deploy - INFO - Received signal
2023-03-09 11:52:57,194 - Code Deploy - INFO - Received message signal
2023-03-09 11:52:57,194 - Code Deploy - INFO - Branch is Testing
2023-03-09 11:52:57,195 - Code Deploy - INFO - Deployment started
2023-03-09 11:53:58,246 - Code Deploy - INFO - Old version2_0_5_13
2023-03-09 11:53:58,246 - Code Deploy - INFO - New version2_0_5_14
2023-03-09 11:53:58,498 - Code Deploy - INFO - Permission fixed
2023-03-09 11:53:58,498 - Code Deploy - INFO - Deployment finished
Run Code Online (Sandbox Code Playgroud)

Ed *_*ton 5

使用任何 awk 并且与@terdon 的答案中的脚本之一非常相似,但 IMO 只是在使用 awk 的condition { action }主体结构时更惯用 awkish :

$ awk '
    /Received signal/ { f=1; rec="" }
    f { rec = rec $0 ORS }
    /Deployment finished/ { f=0 }
    END { if (f=="0") printf "%s", rec }
' file
2023-03-09 11:52:57,194 - Code Deploy - INFO - Received signal
2023-03-09 11:52:57,194 - Code Deploy - INFO - Received message signal
2023-03-09 11:52:57,194 - Code Deploy - INFO - Branch is Testing
2023-03-09 11:52:57,195 - Code Deploy - INFO - Deployment started
2023-03-09 11:53:58,246 - Code Deploy - INFO - Old version2_0_5_13
2023-03-09 11:53:58,246 - Code Deploy - INFO - New version2_0_5_14
2023-03-09 11:53:58,498 - Code Deploy - INFO - Permission fixed
2023-03-09 11:53:58,498 - Code Deploy - INFO - Deployment finished
Run Code Online (Sandbox Code Playgroud)

这个和 @terdon 的答案之间的细微功能差异是:

  1. 万一您决定将 ORS 设置为与 RS 不同的值(例如,您可能想转换RS='\r\n'ORS='\n'),这将产生所需的记录终止符,而 @terdon 将使用 ORS 在大多数输出​​中重现 RS 值在它的最后。
  2. 如果输入文件不包含任何Received signal行,@terdon 会打印一个空行,而这个文件不会产生任何输出。
  3. 如果输入中存在两个定界符,则此命令只会打印定界符之间的文本,而 @terdon 会打印一行后面的任何内容Received signal,即使不Deployment finished存在后续行。

关于awk '/Received signal/,/Deployment finished/' /tmp/result.log您的问题 - 不要使用范围表达式,使用标志,请参阅is-a-start-end-range-expression-ever-useful-in-awk。正如您在迄今为止发布的每个使用范围表达式的答案中所看到的,它需要对相同的条件进行两次测试。


ter*_*don 3

一个简单的技巧是将每个匹配项保存在一个变量中,覆盖之前的内容,然后在脚本末尾打印该变量:

$ awk '{ 
         if(/Received signal/){k=1; v=$0} 
         else if(k==1){
           v=v RS $0; 
           if(/Deployment finished/){ k=0 }
         }
       } 
       END{ print v }' result.log
2023-03-09 11:52:57,194 - Code Deploy - INFO - Received signal
2023-03-09 11:52:57,194 - Code Deploy - INFO - Received message signal
2023-03-09 11:52:57,194 - Code Deploy - INFO - Branch is Testing
2023-03-09 11:52:57,195 - Code Deploy - INFO - Deployment started
2023-03-09 11:53:58,246 - Code Deploy - INFO - Old version2_0_5_13
2023-03-09 11:53:58,246 - Code Deploy - INFO - New version2_0_5_14
2023-03-09 11:53:58,498 - Code Deploy - INFO - Permission fixed
2023-03-09 11:53:58,498 - Code Deploy - INFO - Deployment finished
Run Code Online (Sandbox Code Playgroud)

或者您可以使用tac反转文件,然后打印第一个匹配项:

$ tac result.log | 
   awk '/Deployment finished/,/Received signal/{
      print; 
      if(/Received signal/){ exit }
   }' | tac
2023-03-09 11:52:57,194 - Code Deploy - INFO - Received signal
2023-03-09 11:52:57,194 - Code Deploy - INFO - Received message signal
2023-03-09 11:52:57,194 - Code Deploy - INFO - Branch is Testing
2023-03-09 11:52:57,195 - Code Deploy - INFO - Deployment started
2023-03-09 11:53:58,246 - Code Deploy - INFO - Old version2_0_5_13
2023-03-09 11:53:58,246 - Code Deploy - INFO - New version2_0_5_14
2023-03-09 11:53:58,498 - Code Deploy - INFO - Permission fixed
2023-03-09 11:53:58,498 - Code Deploy - INFO - Deployment finished
Run Code Online (Sandbox Code Playgroud)