DVK*_*DVK 4 regex perl readability
我有一个形式的字符串数组:
@source = (
"something,something2,third"
,"something,something3 ,third"
,"something,something4"
,"something,something 5" # Note the space in the middle of the word
);
Run Code Online (Sandbox Code Playgroud)
我需要一个正则表达式,它将提取逗号分隔的第二个单词,但没有尾随空格,将第二个单词放在一个数组中.
@expected_result = ("something2","something3","something4","something 5");
Run Code Online (Sandbox Code Playgroud)
实现这一目标的最可读方式是什么?
我有3种可能性,这两种可能性都不是最佳可读性:
纯正则表达式然后捕获1美元
@result = map { (/[^,]+,([^,]*[^, ]) *(,|$)/ )[0] } @source;
Run Code Online (Sandbox Code Playgroud)在逗号上拆分(这不是CSV,因此不需要解析),然后修剪:
@result = map { my @s = split(","), $s[1] =~ s/ *$//; $s[1] } @source;
Run Code Online (Sandbox Code Playgroud)将分割和修剪放入嵌套的maps中
@result = map { s/ *$//; $_ } map { (split(","))[1] } @source;
Run Code Online (Sandbox Code Playgroud)哪一种更好?我还没有想到的任何其他更可读的替代方案?
在这些可能性中,我认为#2是最清晰的,但我认为我会稍微调整它以包含以下空格split:
@result = map { my @s = split(/ *(?:,|$)/); $s[1] } @source;
Run Code Online (Sandbox Code Playgroud)
(就此而言,我可能实际上是/[ ]*(?:,|$)/用无操作字符类编写的,只是因为它更加明显*是量化的.)
编辑添加:哎呀,我之前有一个愚蠢的错误,这不会删除类似的尾随空格"foo, bar ".现在我已经解决了这个错误,结果并不那么简单,而且我不再确定我是否推荐上述内容!
使用命名捕获组并为子模式指定名称(DEFINE)以极大地提高可读性.
#! /usr/bin/env perl
use strict;
use warnings;
use 5.10.0; # for named capture buffer and (?&...)
my $second_trimmed_field_pattern = qr/
(?&FIRST_FIELD) (?&SEP) (?<f2> (?&SECOND_FIELD))
(?(DEFINE)
# The separator is a comma preceded by optional whitespace.
# NOTE: the format simple comma separators, NOT full CSV, so
# we don't have to worry about processing escapes or quoted
# fields.
(?<SEP> \s* ,)
# A field stops matching as soon as it sees a separator
# or end-of-string, so it matches in similar fashion to
# a pattern with a non-greedy quantifier.
(?<FIELD> (?: (?! (?&SEP) | $) .)+ )
# The first field is anchored at start-of-string.
(?<FIRST_FIELD> ^ (?&FIELD))
# The second field looks like any other field. The name
# captures our intent for its use in the main pattern.
(?<SECOND_FIELD> (?&FIELD))
)
/x;
Run Code Online (Sandbox Code Playgroud)
在行动:
my @source = (
"something,something2,third"
,"something,something3 ,third"
,"something,something4"
,"something,something 5" # Note the space in the middle of the word
);
for (@source) {
if (/$second_trimmed_field_pattern/) {
print "[$+{f2}]\n";
#print "[$1]\n"; # or do it the old-fashioned way
}
else {
chomp;
print "no match for [$_]\n";
}
}
Run Code Online (Sandbox Code Playgroud)
输出:
[something2] [something3] [something4] [something 5]
你可以用旧的perls表达它.下面,我将这些部分限制在子词汇的范围内,以表明它们作为一个整体一起工作.
sub make_second_trimmed_field_pattern {
my $sep = qr/
# The separator is a comma preceded by optional whitespace.
# NOTE: the format simple comma separators, NOT full CSV, so
# we don't have to worry about processing escapes or quoted
# fields.
\s* ,
/x;
my $field = qr/
# A field stops matching as soon as it sees a separator
# or end-of-string, so it matches in similar fashion to
# a pattern with a non-greedy quantifier.
(?:
# the next character to be matched is not the
# beginning of a separator sequence or
# end-of-string
(?! $sep | $ )
# ... so consume it
.
)+ # ... as many times as possible
/x;
qr/ ^ $field $sep ($field) /x;
}
Run Code Online (Sandbox Code Playgroud)
使用它作为
my @source = ...; # same as above
my $second_trimmed_field_pattern = make_second_trimmed_field_pattern;
for (@source) {
if (/$second_trimmed_field_pattern/) {
print "[$1]\n";
}
else {
chomp;
print "no match for [$_]\n";
}
}
Run Code Online (Sandbox Code Playgroud)
输出:
$ perl5.8.8 prog [something2] [something3] [something4] [something 5]
| 归档时间: |
|
| 查看次数: |
1217 次 |
| 最近记录: |