我正在尝试过滤数千个文件,寻找那些包含混合大小写的字符串常量的文件.这些字符串可以嵌入空格中,但本身可能不包含空格.所以以下(包含UC字符)是匹配:
" AString " // leading and trailing spaces together allowed
"AString " // trailing spaces allowed
" AString" // leading spaces allowed
"newString03" // numeric chars allowed
"!stringBIG?" // non-alphanumeric chars allowed
"R" // Single UC is a match
Run Code Online (Sandbox Code Playgroud)
但这些不是:
"A String" // not a match because it contains an embedded space
"Foo bar baz" // does not match due to multiple whitespace interruptions
"a_string" // not a match because there are no UC chars
Run Code Online (Sandbox Code Playgroud)
我仍然希望匹配包含两种模式的行:
"ABigString", "a sentence fragment" // need to catch so I find the first case...
Run Code Online (Sandbox Code Playgroud)
我想使用Perl regexps,最好由ack命令行工具驱动.显然,\ w和\ W不会起作用.似乎\ S应匹配非空间字符.我似乎无法弄清楚如何嵌入"每串至少一个大写字符"的要求......
ack --match '\"\s*\S+\s*\"'
Run Code Online (Sandbox Code Playgroud)
是我得到的最接近的.我需要更换\ S +与某物捕获所述"至少一个大写(ASCII)字符(在的非空白串中的任何位置)"的要求.
这很简单,可以用C/C++编程(是的,Perl,程序上,不需要使用正则表达式),我只是想弄清楚是否有一个正则表达式可以做同样的工作.
以下模式通过了所有测试:
qr/
" # leading single quote
(?! # filter out strings with internal spaces
[^"]* # zero or more non-quotes
[^"\s] # neither a quote nor whitespace
\s+ # internal whitespace
[^"\s] # another non-quote, non-whitespace character
)
[^"]* # zero or more non-quote characters
[A-Z] # at least one uppercase letter
[^"]* # followed by zero or more non-quotes
" # and finally the trailing quote
/x
Run Code Online (Sandbox Code Playgroud)
使用这个测试程序 - 使用上面的模式/x,没有空格和注释 - 作为输入ack-grep(ack在Ubuntu上调用)
#! /usr/bin/perl
my @tests = (
[ q<" AString "> => 1 ],
[ q<"AString "> => 1 ],
[ q<" AString"> => 1 ],
[ q<"newString03"> => 1 ],
[ q<"!stringBIG?"> => 1 ],
[ q<"R"> => 1 ],
[ q<"A String"> => 0 ],
[ q<"a_string"> => 0 ],
[ q<"ABigString", "a sentence fragment"> => 1 ],
[ q<" a String "> => 0 ],
[ q<"Foo bar baz"> => 0 ],
);
my $pattern = qr/"(?![^"]*[^"\s]\s+[^"\s])[^"]*[A-Z][^"]*"/;
for (@tests) {
my($str,$expectMatch) = @$_;
my $matched = $str =~ /$pattern/;
print +($matched xor $expectMatch) ? "FAIL" : "PASS",
": $str\n";
}
Run Code Online (Sandbox Code Playgroud)
产生以下输出:
$ ack-grep '"(?![^"]*[^"\s]\s+[^"\s])[^"]*[A-Z][^"]*"' try
[ q<" AString "> => 1 ],
[ q<"AString "> => 1 ],
[ q<" AString"> => 1 ],
[ q<"newString03"> => 1 ],
[ q<"!stringBIG?"> => 1 ],
[ q<"R"> => 1 ],
[ q<"ABigString", "a sentence fragment"> => 1 ],
my $pattern = qr/"(?![^"]*[^"\s]\s+[^"\s])[^"]*[A-Z][^"]*"/;
print +($matched xor $expectMatch) ? "FAIL" : "PASS",
Run Code Online (Sandbox Code Playgroud)
使用C shell和衍生物,你必须逃离爆炸:
% ack-grep '"(?\![^"]*[^"\s]\s+[^"\s])[^"]*[A-Z][^"]*"' ...
Run Code Online (Sandbox Code Playgroud)
我希望我可以保留突出显示的匹配,但似乎不允许.
请注意,转义的双引号(\")会严重混淆此模式.