python re.compile字符串与变量和数字

Question

python re.compile字符串与变量和数字

Mur*_*ay3 0 python regex nlp

嗨,我想得到以下匹配:

test = re.compile(r' [0-12](am|pm) [1-1000] days from (yesterday|today|tomorrow)')

这场比赛:

print test.match(" 3pm 2 days from today")

它没有返回,我做错了什么？我正在进入正则表达式并阅读我认为应该工作的文档!任何帮助赞赏的圣诞节

-------------------------------------------------- ------------------------------------

我在NLP HERE中使用与上述类似的过程询问关于系统设计的新问题

Answer 1

rid*_*ner 5

这是我戴着戒指的帽子.仔细研究这个正则表达式将教几个教训:

import re
reobj = re.compile(
    r"""# Loosely match a date/time reference
    ^                    # Anchor to start of string.
    \s*                  # Optional leading whitespace.
    (?P<time>            # $time: military or AM/PM time.
      (?:                # Group for military hours options.
        [2][0-3]         # Hour is either 20, 21, 22, 23,
      | [01]?[0-9]       # or 0-9, 00-09 or 10-19
      )                  # End group of military hours options.
      (?:                # Group for optional minutes.
        :                # Hours and minutes separated by ":"
        [0-5][0-9]       # 00-59 minutes
      )?                 # Military minutes are optional.
    |                    # or time is given in AM/PM format.
      (?:1[0-2]|0?[1-9]) # 1-12 or 01-12 AM/PM options (hour)
      (?::[0-5][0-9])?   # Optional minutes for AM/PM time.
      \s*                # Optional whitespace before AM/PM.
      [ap]m              # Required AM or PM (case insensitive)
    )                    # End group of time options.
    \s+                  # Required whitespace.
    (?P<offset> \d+ )    # $offset: count of time increments.
    \s+                  # Required whitespace.
    (?P<units>           # $units: units of time increment.
      (?:sec(?:ond)?|min(ute)?|hour|day|week|month|year|decade|century)
      s?                 # Time units may have optional plural "s".
    )                    # End $units: units of time increment.
    \s+                  # Required whitespace.
    (?P<dir>from|before|after|since) # #dir: Time offset direction.
    \s+                  # Required whitespace.
    (?P<base>yesterday|today|tomorrow|(?:right )?now)
    \s*                  # Optional whitespace before end.
    $                    # Anchor to end of string.""", 
    re.IGNORECASE | re.VERBOSE)
match = reobj.match(' 3 pm 2 days from today')
if match:
    print('Time:       %s' % (match.group('time')))
    print('Offset:     %s' % (match.group('offset')))
    print('Units:      %s' % (match.group('units')))
    print('Direction:  %s' % (match.group('dir')))
    print('Base time:  %s' % (match.group('base')))
else:
    print("No match.")

Run Code Online (Sandbox Code Playgroud)

输出:

r"""
Time:       3 pm
Offset:     2
Units:      days
Direction:  from
Base time:  today
"""

Run Code Online (Sandbox Code Playgroud)

这个正则表达式说明了一些可以吸取的教训:

正则表达式非常强大(也很有用)!
这个正则表达式确实验证了数字,但正如你所看到的那样,这样做既麻烦又困难(因此,不推荐 - 我在这里展示它以证明为什么不这样做).使用正则表达式简单地捕获数字然后使用过程代码验证范围要容易得多.
命名捕获组可以减轻从较大文本中提取多个数据子字符串的麻烦.
始终使用自由间距,详细模式以及适当的组缩进和大量描述性注释来编写正则表达式.这有助于在编写正则表达式时以及稍后的维护期间.

现代正则表达式包含丰富而强大的语言.一旦你学习了语法并养成了编写冗长,正确缩进,注释良好的代码的习惯,那么即使是复杂的正则表达式也很容易编写,易于阅读且易于维护.不幸的是,他们因困难,笨拙和容易出错而声名鹊起(因此不适合复杂的任务).

快乐regexing!

归档时间：	14 年，10 月前
查看次数：	3593 次
最近记录：	14 年，10 月前