为什么这个正则表达式如此懒惰?它应该返回引用高度/宽度属性,介于两者之间(可选),然后是另一个高度/宽度属性(可选).它只获得第一个属性,然后即使它可以匹配更多也退出.
((?:height|width)=["']\d*["'])([\s\w:;'"=])*?((?:height|width)=["']\d*["'])?
Run Code Online (Sandbox Code Playgroud)
查看正在发生的事情的最简单方法是将其分解为扩展格式.在扩展格式中,你的正则表达式......
((?:height|width)=["']\d*["'])([\s\w:;'"=])*?((?:height|width)=["']\d*["'])?
Run Code Online (Sandbox Code Playgroud)
然后变成(带有评论,扩展格式合法):
( # a group that captures...
(?:height|width) # Height or width
= # The Equals sign
["'] # a double quote or quote
\d* # zero or more digits 0-9
["'] # a double quote or quote
) # requried
( # zero or more groups that capture...space chars,
[\s\w:;'"=] # letters, numbers, colon, quote, dobule quote, and equals
)*? # zero or more times, lazily (giving up as much as it can)
( # a group that...
(?:height|width) # height or width
= # colon
["'] # double quote or quote
\d* # zero or more numbers
["'] # double quote or quote
)? # optionally
Run Code Online (Sandbox Code Playgroud)
因此,根据您正在使用的正则表达式引擎,您的正则表达式可能会生成1个组,最多可生成N个组.你的最后一组将是你想要的小组,如果有的话.删除第二组(the ?)的延迟修饰符并使第二组不捕获,如下所示:
( # a group that captures...
(?:height|width) # Height or width (non capturing)
= # The Equals sign
["'] # a double quote or quote
\d* # zero or more digits 0-9
["'] # a double quote or quote
) # requried
(?: # zero or more groups of space chars, letters,
[\s\w:;'"=] # numbers, colon, quote, dobule quote, and equals
)* # zero or more times as much as it can UNTIL...
( # a group that captures...
(?:height|width) # height or width (non-capturing)
= # colon
["'] # double quote or quote
\d* # zero or more numbers
["'] # double quote or quote
)? # optional
Run Code Online (Sandbox Code Playgroud)
现在第一个和最后一个标签将分别在第1组和第2组中,中间的内容被忽略.如果有最后一个,它将被捕获.
注意:它可能没有捕获最后一部分,因为没有指定需要在中间组中捕获的字符.如果有一个逗号,一个#或任何其他类型的标记字符,则它们不会被该中间组的字符类指定.您可以考虑用以下内容替换中间的:
["'] # a double quote or quote
) # requried
.* # Anything, zero or more times, UNTIL...
( # a group that...
(?:height|width) # height or width (non-capturing)
Run Code Online (Sandbox Code Playgroud)
并查看该DOES是否匹配.如果是,您可能需要进一步增强中间组的角色.
如果您不关心中间组中发生了多少匹配,只需捕获它,使用非捕获组捕获每个子集,然后使用组捕获整个中间组集合:
["'] # a double quote or quote
) # requried
( # a group that captures...
(?: # zero or more groups of space chars, letters,
[\s\w:;'"=] # numbers, colon, quote, dobule quote, and equals
)* # zero or more times as much as it can
) # UNTIL...
( # a group that captures...
(?:height|width) # height or width (non-capturing)
Run Code Online (Sandbox Code Playgroud)
现在你将获得固定数量的捕获,第一部分总是在第1组中,中间部分总是在第2组中,最后一部分(如果它在那里)在第3组中.
| 归档时间: |
|
| 查看次数: |
105 次 |
| 最近记录: |