ojb*_*ass 41 regex maintenance readability
我开始觉得使用正则表达式会降低代码的可维护性.正则表达式的简洁性和强大功能有些恶意.Perl将其与副作用(如默认运算符)相结合.
我有习惯记录正则表达式,至少有一个句子给出基本意图,至少有一个匹配的例子.
因为构建了正则表达式,所以我觉得对表达式中每个元素的最大组件进行注释是绝对必要的.尽管如此,即便是我自己的正则表达式让我摸不着头脑,好像我在读克林贡一样.
你故意愚弄你的正则表达式吗?你是否将可能更短,更强大的那些分解成更简单的步骤?我放弃了嵌套正则表达式.是否存在由于可维护性问题而避免的正则表达式构造?
不要让这个例子覆盖这个问题.
如果迈克尔·艾什的下面有一些错误,你会有什么可以做任何事情,但完全扔掉它?
^(?:(?:(?:0?[13578]|1[02])(\/|-|\.)31)\1|(?:(?:0?[13-9]|1[0-2])(\/|-|\.)(?:29|30)\2))(?:(?:1[6-9]|[2-9]\d)?\d{2})$|^(?:0?2(\/|-|\.)29\3(?:(?:(?:1[6-9]|[2-9]\d)?(?:0[48]|[2468][048]|[13579][26])|(?:(?:16|[2468][048]|[3579][26])00))))$|^(?:(?:0?[1-9])|(?:1[0-2]))(\/|-|\.)(?:0?[1-9]|1\d|2[0-8])\4(?:(?:1[6-9]|[2-9]\d)?\d{2})$
Run Code Online (Sandbox Code Playgroud)
根据请求,可以使用上面的Ash先生的链接找到确切的目的.
比赛 01.1.02 | 11-30-2001 | 2000年2月29日
非比赛 02/29/01 | 13/01/2002 | 11/00/02
Mit*_*eat 32
使用Expresso,它给出了正则表达式的分层,英语细分.
要么
这提示由达伦Neimke:
.NET允许通过RegExOptions.IgnorePatternWhitespace编译器选项和嵌入在模式字符串的每一行中的(?#...)语法,使用嵌入式注释创建正则表达式模式.
这允许在每行中嵌入类似psuedo-code的注释,并对可读性产生以下影响:
Dim re As New Regex ( _
"(?<= (?# Start a positive lookBEHIND assertion ) " & _
"(#|@) (?# Find a # or a @ symbol ) " & _
") (?# End the lookBEHIND assertion ) " & _
"(?= (?# Start a positive lookAHEAD assertion ) " & _
" \w+ (?# Find at least one word character ) " & _
") (?# End the lookAHEAD assertion ) " & _
"\w+\b (?# Match multiple word characters leading up to a word boundary)", _
RegexOptions.Multiline Or RegexOptions.IgnoreCase Or RegexOptions.IgnoreWhitespace _
)
Run Code Online (Sandbox Code Playgroud)
这是另一个.NET示例(需要RegexOptions.Multiline
和RegexOptions.IgnorePatternWhitespace
选项):
static string validEmail = @"\b # Find a word boundary
(?<Username> # Begin group: Username
[a-zA-Z0-9._%+-]+ # Characters allowed in username, 1 or more
) # End group: Username
@ # The e-mail '@' character
(?<Domainname> # Begin group: Domain name
[a-zA-Z0-9.-]+ # Domain name(s), we include a dot so that
# mail.somewhere is also possible
.[a-zA-Z]{2,4} # The top level domain can only be 4 characters
# So .info works, .telephone doesn't.
) # End group: Domain name
\b # Ending on a word boundary
";
Run Code Online (Sandbox Code Playgroud)
如果您的RegEx适用于常见问题,则另一种选择是将其记录并提交给RegExLib,在RegExLib中对其进行评级和评论.什么都不比许多眼睛好......
另一个RegEx工具是The Regulator
Jam*_*mes 19
我通常只是尝试将所有正则表达式调用包含在自己的函数中,并使用有意义的名称和一些基本注释.我喜欢将正则表达式视为只写语言,只能由编写它的人阅读(除非它非常简单).我完全期望有人可能需要完全重写表达式,如果他们必须改变其意图,这可能是为了更好地保持正则表达式训练活着.
cha*_*aos 17
好吧,PCRE/x修饰符的整个生命目的是让你更可读地编写正则表达式,就像在这个简单的例子中一样:
my $expr = qr/
[a-z] # match a lower-case letter
\d{3,5} # followed by 3-5 digits
/x;
Run Code Online (Sandbox Code Playgroud)
有些人将RE用于错误的东西(我正在等待关于如何使用单个RE检测有效C++程序的第一个SO问题).
我经常发现,如果我不能将我的RE放在60个字符以内,最好不要成为一段代码,因为这几乎总是更具可读性.
无论如何,我总是在代码中记录RE应该实现的内容,非常详细.这是因为我知道,从痛苦的经历来看,对于其他人(甚至是我,六个月后)进入并试图理解是多么困难.
我不相信他们是邪恶的,虽然我相信一些使用它们的人是邪恶的(不是看着你,Michael Ash :-).它们是一个很好的工具,但是,就像电锯一样,如果你不知道如何正确使用它们,你会剪断你的腿.
更新:实际上,我刚刚跟踪了那个怪物的链接,它是为了验证1600年到999年之间的m/d/y格式日期.这是一个经典案例,其中完整的代码将更易读和可维护.
您只需将其拆分为三个字段并检查各个值.如果我的一个仆从买了这个,我几乎认为这是一个值得终止的罪行.我当然会把它们送回来正确写出来.
这是同样的正则表达式分解成易消化的碎片.除了更具可读性之外,一些子正则表达式本身也很有用.更改允许的分隔符也更加容易.
#!/usr/local/ActivePerl-5.10/bin/perl
use 5.010; #only 5.10 and above
use strict;
use warnings;
my $sep = qr{ [/.-] }x; #allowed separators
my $any_century = qr/ 1[6-9] | [2-9][0-9] /x; #match the century
my $any_decade = qr/ [0-9]{2} /x; #match any decade or 2 digit year
my $any_year = qr/ $any_century? $any_decade /x; #match a 2 or 4 digit year
#match the 1st through 28th for any month of any year
my $start_of_month = qr/
(?: #match
0?[1-9] | #Jan - Sep or
1[0-2] #Oct - Dec
)
($sep) #the separator
(?:
0?[1-9] | # 1st - 9th or
1[0-9] | #10th - 19th or
2[0-8] #20th - 28th
)
\g{-1} #and the separator again
/x;
#match 28th - 31st for any month but Feb for any year
my $end_of_month = qr/
(?:
(?: 0?[13578] | 1[02] ) #match Jan, Mar, May, Jul, Aug, Oct, Dec
($sep) #the separator
31 #the 31st
\g{-1} #and the separator again
| #or
(?: 0?[13-9] | 1[0-2] ) #match all months but Feb
($sep) #the separator
(?:29|30) #the 29th or the 30th
\g{-1} #and the separator again
)
/x;
#match any non-leap year date and the first part of Feb in leap years
my $non_leap_year = qr/ (?: $start_of_month | $end_of_month ) $any_year/x;
#match 29th of Feb in leap years
#BUG: 00 is treated as a non leap year
#even though 2000, 2400, etc are leap years
my $feb_in_leap = qr/
0?2 #match Feb
($sep) #the separtor
29 #the 29th
\g{-1} #the separator again
(?:
$any_century? #any century
(?: #and decades divisible by 4 but not 100
0[48] |
[2468][048] |
[13579][26]
)
|
(?: #or match centuries that are divisible by 4
16 |
[2468][048] |
[3579][26]
)
00
)
/x;
my $any_date = qr/$non_leap_year|$feb_in_leap/;
my $only_date = qr/^$any_date$/;
say "test against garbage";
for my $date (qw(022900 foo 1/1/1)) {
say "\t$date ", $date ~~ $only_date ? "matched" : "didn't match";
}
say '';
#comprehensive test
my @code = qw/good unmatch month day year leap/;
for my $sep (qw( / - . )) {
say "testing $sep";
my $i = 0;
for my $y ("00" .. "99", 1600 .. 9999) {
say "\t", int $i/8500*100, "% done" if $i++ and not $i % 850;
for my $m ("00" .. "09", 0 .. 13) {
for my $d ("00" .. "09", 1 .. 31) {
my $date = join $sep, $m, $d, $y;
my $re = $date ~~ $only_date || 0;
my $code = not_valid($date);
unless ($re == !$code) {
die "error $date re $re code $code[$code]\n"
}
}
}
}
}
sub not_valid {
state $end = [undef, 31, 29, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31];
my $date = shift;
my ($m,$d,$y) = $date =~ m{([0-9]+)[-./]([0-9]+)[-./]([0-9]+)};
return 1 unless defined $m; #if $m is set, the rest will be too
#components are in roughly the right ranges
return 2 unless $m >= 1 and $m <= 12;
return 3 unless $d >= 1 and $d <= $end->[$m];
return 4 unless ($y >= 0 and $y <= 99) or ($y >= 1600 and $y <= 9999);
#handle the non leap year case
return 5 if $m == 2 and $d == 29 and not leap_year($y);
return 0;
}
sub leap_year {
my $y = shift;
$y = "19$y" if $y < 1600;
return 1 if 0 == $y % 4 and 0 != $y % 100 or 0 == $y % 400;
return 0;
}
Run Code Online (Sandbox Code Playgroud)