使用 grep 匹配模式 \\\"

4 grep quoting regular-expression escape-characters

我在 json 中有一个 json 字符串。这被多次编码,我最终得到了许多转义强烈反对:\\\".

大大缩短的字符串看起来像,

'[{"testId" : "12345", "message": "\\\"the status is pass\\\" comment \\\\\"this is some weird encoding\\\\\""}]'
Run Code Online (Sandbox Code Playgroud)

我正在尝试 grep 并获取模式的出现次数\\\"而不是\\\\\"

我该怎么做?

任何 shell/python 解决方案都很好。在python中,使用搜索字符串

search_string = r"""\\\\\""",抛出unexpected EOF错误。

Sté*_*las 12

要查找\\\"一行中的任何位置:

grep -F '\\\"'
Run Code Online (Sandbox Code Playgroud)

也就是说,-F用于固定字符串搜索而不是正则表达式匹配(反斜杠是特殊的)。并使用强引号 ( '...'),其中反斜杠并不特殊。

没有-F,您需要将反斜杠加倍:

grep '\\\\\\"'
Run Code Online (Sandbox Code Playgroud)

或使用:

grep '\\\{3\}"'
grep -E '\\{3}"'
grep -E '[\]{3}"'
Run Code Online (Sandbox Code Playgroud)

在双引号内,您需要另一个级别的反斜杠,并"使用反斜杠转义:

#              1
#     1234567890123
grep "\\\\\\\\\\\\\""
Run Code Online (Sandbox Code Playgroud)

backslash is another shell quoting operator. So you can also quote those backslash and " characters with backslash:

\g\r\e\p \\\\\\\\\\\\\"
Run Code Online (Sandbox Code Playgroud)

I've even quoted the characters of grep above though that's not necessary (as none of g, r, e, p are special to the shell (except in the Bourne shell if they appear in $IFS). The only character I've not quoted is the space character, as we do need its special meaning in the shell: separate arguments.

To look for \\\" provided it's not preceded by another backslash

grep -e '^\\\\\\"' -e '[^\]\\\\\\"'
Run Code Online (Sandbox Code Playgroud)

That is, look for \\\" at the beginning of the line, or following a character other than backslash.

That time, we have to use a regular expression, a fixed-string search won't do.

grep returns the lines that match any of those expressions. You can also write it with one expression per line:

grep '^\\\\\\"
[^\]\\\\\\"'
Run Code Online (Sandbox Code Playgroud)

Or with only one expression:

grep '^\(.*[^\]\)\{0,1\}\\\{3\}"' # BRE
grep -E '^(.*[^\])?\\{3}"'        # ERE equivalent
grep -E '(^|[^\])\\{3}"'
Run Code Online (Sandbox Code Playgroud)

With GNU grep built with PCRE support, you can use a look-behind negative assertion:

grep -P '(?<!\\)\\{3}"'
Run Code Online (Sandbox Code Playgroud)

Get a match count

To get a count of the lines that match the pattern (that is, that have one or more occurrences of \\\"), you'd add the -c option to grep. If however you want the number of occurrences, you can use the GNU specific -o option (though now also supported by a few other implementations) to print all the matches one per line, and then pipe to wc -l to get a line-count:

grep -Po '(?<!\\)\\{3}"' | wc -l
Run Code Online (Sandbox Code Playgroud)

Or standardly/POSIXly, use awk instead:

awk '{n+=gsub(/(^|[^\\])\\{3}"/,"")};END{print 0+n}'
Run Code Online (Sandbox Code Playgroud)

(awk's gsub() substitutes and returns the number of substitutions).