如果我有很多匹配,例如在多行模式下,我想用匹配的一部分替换它们以及增量的计数器号.
我想知道任何正则表达式的味道是否有这样的变量.我找不到一个,但我似乎记得那样存在......
我不是在谈论可以使用回调替换的脚本语言.这是关于能够在RegexBuddy,sublime text,gskinner.com/RegExr等工具中实现这一点,就像你可以用\ 1或$ 1引用捕获的子串一样.
tch*_*ist 58
好的,我将从简单到崇高.请享用!
鉴于这种:
#!/usr/bin/perl
$_ = <<"End_of_G&S";
This particularly rapid,
unintelligible patter
isn't generally heard,
and if it is it doesn't matter!
End_of_G&S
my $count = 0;
Run Code Online (Sandbox Code Playgroud)
然后这个:
s{
\b ( [\w']+ ) \b
}{
sprintf "(%s)[%d]", $1, ++$count;
}gsex;
Run Code Online (Sandbox Code Playgroud)
产生这个
(This)[1] (particularly)[2] (rapid)[3],
(unintelligible)[4] (patter)[5]
(isn't)[6] (generally)[7] (heard)[8],
(and)[9] (if)[10] (it)[11] (is)[12] (it)[13] (doesn't)[14] (matter)[15]!
Run Code Online (Sandbox Code Playgroud)
鉴于此:
s/\b([\w']+)\b/#@{[++$count]}=$1/g;
Run Code Online (Sandbox Code Playgroud)
产生这个:
#1=This #2=particularly #3=rapid,
#4=unintelligible #5=patter
#6=isn't #7=generally #8=heard,
#9=and #10=if #11=it #12=is #13=it #14=doesn't #15=matter!
Run Code Online (Sandbox Code Playgroud)
这将增量放在匹配中:
s/ \b ( [\w']+ ) \b (?{ $count++ }) /#$count=$1/gx;
Run Code Online (Sandbox Code Playgroud)
得出这个:
#1=This #2=particularly #3=rapid,
#4=unintelligible #5=patter
#6=isn't #7=generally #8=heard,
#9=and #10=if #11=it #12=is #13=it #14=doesn't #15=matter!
Run Code Online (Sandbox Code Playgroud)
这个
s{ \b ( [\w'] + ) \b }
{ join " " => ($1) x ++$count }gsex;
Run Code Online (Sandbox Code Playgroud)
产生这个令人愉快的答案:
This particularly particularly rapid rapid rapid,
unintelligible unintelligible unintelligible unintelligible patter patter patter patter patter
isn't isn't isn't isn't isn't isn't generally generally generally generally generally generally generally heard heard heard heard heard heard heard heard,
and and and and and and and and and if if if if if if if if if if it it it it it it it it it it it is is is is is is is is is is is is it it it it it it it it it it it it it doesn't doesn't doesn't doesn't doesn't doesn't doesn't doesn't doesn't doesn't doesn't doesn't doesn't doesn't matter matter matter matter matter matter matter matter matter matter matter matter matter matter matter!
Run Code Online (Sandbox Code Playgroud)
对于复数所有者而言,有更强大的词边界方法(以前的方法没有),但我怀疑你的神秘在于++$count触发,而不是\b行为的微妙之处.
我真的希望人们明白这\b不是他们认为的那样.他们总是认为这意味着那里有白色空间或弦的边缘.他们从未将其视为过渡\w\W或\W\w过渡.
# same as using a \b before:
(?(?=\w) (?<!\w) | (?<!\W) )
# same as using a \b after:
(?(?<=\w) (?!\w) | (?!\W) )
Run Code Online (Sandbox Code Playgroud)
如你所见,它取决于触摸的内容是有条件的.这就是该(?(COND)THEN|ELSE)条款的用途.
这成为一个问题,如:
$_ = qq('Tis Paul's parents' summer-house, isn't it?\n);
my $count = 0;
s{
(?(?=[\-\w']) (?<![\-\w']) | (?<![^\-\w']) )
( [\-\w'] + )
(?(?<=[\-\w']) (?![\-\w']) | (?![^\-\w']) )
}{
sprintf "(%s)[%d]", $1, ++$count
}gsex;
print;
Run Code Online (Sandbox Code Playgroud)
哪个正确打印
('Tis)[1] (Paul's)[2] (parents')[3] (summer-house)[4], (isn't)[5] (it)[6]?
Run Code Online (Sandbox Code Playgroud)
20世纪60年代风格的ASCII已经过时了大约50年.正如每当你看到任何人写作时[a-z],它几乎总是错误的,事实证明,像破折号和引号这样的东西也不应该在模式中显示为文字.虽然我们在这里,你可能不想使用\w,因为它也包括数字和下划线,而不仅仅是字母表.
想象一下这个字符串:
$_ = qq(\x{2019}Tis Ren\x{E9}e\x{2019}s great\x{2010}grandparents\x{2019} summer\x{2010}house, isn\x{2019}t it?\n);
Run Code Online (Sandbox Code Playgroud)
您可以将其作为文字use utf8:
use utf8;
$_ = qq(’Tis Renée’s great?grandparents’ summer?house, isn’t it?\n);
Run Code Online (Sandbox Code Playgroud)
这次我会对模式有所不同,将我对术语的定义与执行分开,试图使其更具可读性,从而可维护:
#!/usr/bin/perl -l
use 5.10.0;
use utf8;
use open qw< :std :utf8 >;
use strict;
use warnings qw< FATAL all >;
use autodie;
$_ = q(’Tis Renée’s great?grandparents’ summer?house, isn’t it?);
my $count = 0;
s{ (?<WORD> (?&full_word) )
# the rest is just definition
(?(DEFINE)
(?<word_char> [\p{Alphabetic}\p{Quotation_Mark}] )
(?<full_word>
# next line won't compile cause
# fears variable-width lookbehind
#### (?<! (?&word_char) ) )
# so must inline it
(?<! [\p{Alphabetic}\p{Quotation_Mark}] )
(?&word_char)
(?:
\p{Dash}
| (?&word_char)
) *
(?! (?&word_char) )
)
) # end DEFINE declaration block
}{
sprintf "(%s)[%d]", $+{WORD}, ++$count;
}gsex;
print;
Run Code Online (Sandbox Code Playgroud)
运行时该代码产生这样的:
(’Tis)[1] (Renée’s)[2] (great?grandparents’)[3] (summer?house)[4], (isn’t)[5] (it)[6]?
Run Code Online (Sandbox Code Playgroud)
好的,所以可能有关于花哨的正则表达的FMTEYEWTK,但你不高兴你问?☺