Cha*_*row 4 regex bash character-class
在cygwin中,这不会返回匹配:
$ echo "aaab" | grep '^[ab]+$'
Run Code Online (Sandbox Code Playgroud)
但这确实会返回一个匹配:
$ echo "aaab" | grep '^[ab][ab]*$'
aaab
Run Code Online (Sandbox Code Playgroud)
这两个表达式不一样吗?有没有办法表达"字符类的一个或多个字符"而不键入两次字符类(如在秒示例中)?
根据这个链接,两个表达式应该是相同的,但也许Regular-Expressions.info不包括cygwin中的bash.
grep有多个匹配的"模式",默认情况下只使用一个基本集,除非它们被转义,否则它们不识别多个元字符.您可以将grep放入扩展或perl模式以+进行评估.
来自man grep:
Matcher Selection
-E, --extended-regexp
Interpret PATTERN as an extended regular expression (ERE, see below). (-E is specified by POSIX.)
-P, --perl-regexp
Interpret PATTERN as a Perl regular expression. This is highly experimental and grep -P may warn of unimplemented features.
Basic vs Extended Regular Expressions
In basic regular expressions the meta-characters ?, +, {, |, (, and ) lose their special meaning; instead use the backslashed versions \?, \+, \{, \|, \(, and \).
Traditional egrep did not support the { meta-character, and some egrep implementations support \{ instead, so portable scripts should avoid { in grep -E patterns and should use [{] to match a literal {.
GNU grep -E attempts to support traditional usage by assuming that { is not special if it would be the start of an invalid interval specification. For example, the command grep -E '{1' searches for the two-character string {1 instead of reporting a syntax
error in the regular expression. POSIX.2 allows this behavior as an extension, but portable scripts should avoid it.
Run Code Online (Sandbox Code Playgroud)
或者,您可以使用egrep而不是grep -E.
在基本的正则表达式的元字符
?,+,{,|,(,和)失去了特殊的意义; 改用反斜杠的版本\ ?,\+,\{,\|,\(,和\).
所以使用backslashed版本:
$ echo aaab | grep '^[ab]\+$'
aaab
Run Code Online (Sandbox Code Playgroud)
或者激活扩展语法:
$ echo aaab | egrep '^[ab]+$'
aaab
Run Code Online (Sandbox Code Playgroud)