用于修剪换行符 \n 和额外空白的 Bash 替代方法

Question

我正在尝试解析一个多行句子：

You have to go tomorrow by
                                        car.

如您所见，有一个新的 line + space 然后是“car”。

我使用了这个正则表达式：

You.have.to.go.tomorrow.by.\n.+

当我将它与regex101一起使用时效果很好，但是当我在 bash 中使用它时，它仅适用于第一句话：

Parser='You.have.to.go.tomorrow.by.\n.+'

结果：

You have to go tomorrow by

我正在使用 bash，我想要完整的句子：

"You have to go tomorrow by car."

我在用：

sed -e 's/<[^>]\+>/ /g' | grep -oP $parser

删除所有 HTML 标签，然后 grep 解析器。

Answer 1

-z, --null-data 用 NUL 字符而不是换行符分隔行，这使得匹配换行符成为可能。

grep -Pzo \
'You have to go tomorrow by\n\s+car.' text | tr -s '\n ' ' '

如果您要在纯 bash 中执行此操作，则可能需要使用 ANSI 引用您的模式来表示换行符。

grep -Pzo \
'You have to go tomorrow by\n\s+car.' text | tr -s '\n ' ' '

假设您打算只清理您所指的行，则可以组合替换。匹配包含 ' You have to go today by ' 的行，我们可以{...}在此匹配项中分组并运行多个带大括号的命令，以分号分隔。

sed -rn '/You have to go tomorrow by/{N; s/\n//; s/ {2,}/ /; s/<[^>]+>//g;p}' text