为什么"2010年"=〜/([0-4]*)/会导致$ 1中的空字符串?

ale*_*kuk 10 regex perl

如果我跑

"Year 2010" =~ /([0-4]*)/;
print $1;
Run Code Online (Sandbox Code Playgroud)

我得到空字符串.但

"Year 2010" =~ /([0-4]+)/;
print $1;
Run Code Online (Sandbox Code Playgroud)

输出"2010".为什么?

Jon*_*eet 19

您在第一个表单的字符串"Year 2010"的开头获得一个空匹配,因为*将立即匹配0位数.+表单必须等到它匹配前至少看到一个数字.

大概如果你可以通过第一个表格的所有比赛,你最终会找到2010 ...但可能只是在它找到'e'之前的另一个空比赛之后,然后才发现'a'之前等等.


Mar*_*ers 6

第一个正则表达式成功匹配字符串开头的零位数,从而捕获空字符串.

第二个正则表达式在字符串的开头不匹配,但在到达2010时它确实匹配.


eum*_*iro 5

第一个匹配开头(之前Y)的零长度字符串并返回它.第二个搜索一个或多个数字并等待它找到2010.


Nik*_*ain 5

您也可以使用YAPE :: Regex :: Explain来解释正则表达式

use YAPE::Regex::Explain;

print YAPE::Regex::Explain->new('([0-4]*)')->explain();
print YAPE::Regex::Explain->new('([0-4]+)')->explain();
Run Code Online (Sandbox Code Playgroud)

输出:

The regular expression:
(?-imsx:([0-4]*))
matches as follows:

NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    [0-4]*                   any character of: '0' to '4' (0 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------

The regular expression:
(?-imsx:([0-4]+))
matches as follows:

NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    [0-4]+                   any character of: '0' to '4' (1 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------
Run Code Online (Sandbox Code Playgroud)