仅在最后一次出现令牌后才打印文件内容？

Question

仅在最后一次出现令牌后才打印文件内容？

我有可以重新启动其内部状态的长时间运行的程序。我只想查看最新状态的日志文件条目（加载到vim's quickfix）。如何在最后一次出现字符串后显示所有行STARTING SESSION？

我目前的解决方案（日志文件有时有千兆字节长，所以我从不查看超过最后 5000 行的内容）：

tail -n5000 logfile.log | grep -B5000 -v -e 'STARTING SESSION'> shortened.log

Run Code Online (Sandbox Code Playgroud)

当会话产生大量日志时，这很有效，但如果我有更短的日志并多次重启，则它包含多个会话。

本质上，我想要一个类似于--reverse标志的东西，它可以从文件的末尾而不是开头进行 grep 搜索：

grep --reverse --after-context=5000 --max-count=1 'STARTING SESSION' logfile.log

Run Code Online (Sandbox Code Playgroud)

笔记：

问题类似于在第 n 次出现匹配后打印行，但我想要最后一次出现。

该问题与在 POSIX.2 中从最后一个标记获取文本到 EOF几乎相同，只是我没有 POSIX 要求并且我的文件很大。我更喜欢使用 GNU utils 的高效解决方案（我正在使用mingw64）。

Answer 1

Ste*_*itt 16

反转文件，显示它直到第一次出现，然后再次反转输出：

tac logfile.log | sed '/STARTING SESSION/q' | tac

Run Code Online (Sandbox Code Playgroud)

tac当给定一个常规（可搜索）文件来处理时是有效的，并且由于sed它一看到开始行就退出，整个管道只会在必要时处理日志文件的结尾（四舍五入到tac's, sed's ，以及内核的缓冲区大小）。这应该可以很好地扩展到大文件。

tac是一个 GNU 实用程序。在非 GNU 系统上，您通常可以使用tail -r它来做同样的事情。

如果日志文件根本没有“STARTING SESSION”行，这将不会产生与您相同的行为grep：它将输出完整的日志文件。为了避免这种情况，可以改用 Kusalananda 方法的一种变体：

tac logfile.log | sed -n '/STARTING SESSION/{H;x;p;q;};H' | tail -n +2 | tac

Run Code Online (Sandbox Code Playgroud)

该sed表达式查找“STARTING SESSION”，匹配时将当前行追加到保持空间，将保持空间与模式空间交换，输出并退出；任何其他行都附加到保持空间。tail -n +2用于跳过第一个空行（将模式空间附加到保持空间会添加一个前导换行符）。

Answer 2

Kus*_*nda 5

sed 不tac使用：

sed \
    -e '/STARTING SESSION/h' \
    -e '//,$ { //!H; }' \
    -e '$!d' \
    -e x logfile.log

Run Code Online (Sandbox Code Playgroud)

或者，;在单行的表达式之间使用，

sed '/STARTING SESSION/h; //,$ { //!H; }; $!d; x' logfile.log

Run Code Online (Sandbox Code Playgroud)

注释变体：

# If this line matches our trigger, save buffer in hold-space (overwrites).
/STARTING SESSION/ h

# In the range from the trigger to the end, append buffer to hold-space,
# but only if the current line isn't the trigger.
# (// re-uses the most recent expression)
//,$ { //!H; }

# If we're not at the end, restart with the next line without outputting anything.
$! d

# At the end, swap the hold-space into the buffer.
x

# (buffer is implicitly printed)

Run Code Online (Sandbox Code Playgroud)

摘要：此sed脚本将触发器和文档结尾之间的所有行保存在sed. 每当找到触发器时，就会清除保持空间。最后，输出保持空间。

如果未找到触发器，则不会产生任何输出。

还要注意，这将必须通读整个文件。

类似的方法awk：

# If this line matches our trigger, save buffer in hold-space (overwrites).
/STARTING SESSION/ h

# In the range from the trigger to the end, append buffer to hold-space,
# but only if the current line isn't the trigger.
# (// re-uses the most recent expression)
//,$ { //!H; }

# If we're not at the end, restart with the next line without outputting anything.
$! d

# At the end, swap the hold-space into the buffer.
x

# (buffer is implicitly printed)

Run Code Online (Sandbox Code Playgroud)

在这里，hold一旦我们找到触发器（当我们i第一次设置为一个时），我们将开始在数组中收集数据。我们删除收集的数据并i在每次触发时重置为 1。

最后，输出所有收集的行。

该delete hold语句不是严格需要的。

可能值得指出的是，这两种方法都涉及读取完整的日志文件，鉴于“日志文件有时长达千兆字节”，这可能是一个问题。 (2认同)

归档时间：	4 年，2 月前
查看次数：	438 次
最近记录：	4 年，2 月前