我试图使用Python RE模块捕获字符串的具体数字好像'03'在' video [720P] [DHR] _sp03.mp4 '.
令我困惑的是:
当我使用'.*\D+(\d+).*mp4'它时,它成功捕获了两个数字03,但是当我使用时'.*\D*(\d+).*mp4',它只捕获了后面的数字3.
我知道python使用贪婪模式作为默认模式,这意味着尝试匹配尽可能多的文本.考虑到这一点,我想*和+后\D应相同则表现.那我在哪里错了?是什么导致了这种差异?谁能帮忙解释一下呢?
BTW:我使用python的在线正则表达式测试器:https://regex101.com/#python
差异的原因不是\D+第一个.*
现在正则表达式.*是贪婪的,并尝试尽可能多地匹配字符
所以当你写作
.*\D*(\d+).*mp4
Run Code Online (Sandbox Code Playgroud)
该.*会匹配尽可能多的,因为它可以.那就是如果我们试图将其分解,那就像是
video [720P] [DHR] _sp03.mp4
|
.*
video [720P] [DHR] _sp03.mp4
|
.*
.....
video [720P] [DHR] _sp03.mp4
|
.* That is 0 is also matched by the .
video [720P] [DHR] _sp03.mp4
|
\D* Since the quantfier is zero or more, it matches nothing here without advancing to 3
video [720P] [DHR] _sp03.mp4
|
(\d+)
video [720P] [DHR] _sp03.mp4
|
.*
video [720P] [DHR] _sp03.mp4
|
mp4
Run Code Online (Sandbox Code Playgroud)
现在当我们使用时\D+,匹配会稍微改变,因为正则数据引擎将被强制匹配数字(\D+)之前至少1个非数字((\d+)).这将消耗p数字之前的最后一位非数字
那是
.*将尝试尽可能多地匹配p,以便\D+能够匹配至少一个非数字,这将是p并且\d+将匹配您的03部分
video [720P] [DHR] _sp03.mp4
|
.*
video [720P] [DHR] _sp03.mp4
|
.*
.....
video [720P] [DHR] _sp03.mp4
|
\D+ The first non digit. Forced to match at least once.
video [720P] [DHR] _sp03.mp4
|
(\d+)
video [720P] [DHR] _sp03.mp4
|
(\d+)
video [720P] [DHR] _sp03.mp4
|
.*
video [720P] [DHR] _sp03.mp4
|
mp4
Run Code Online (Sandbox Code Playgroud)