在正则表达式中使用\ b

Question

在正则表达式中使用\ b

--SOLVED--我通过启用多行模式解决了我的问题,现在字符^和$完美地用于识别每个字符串的开头和结尾

- 编辑 -

我的代码:

import re
import test_regex


def regex_content(text_content, regex_dictionary):

#text_content = text_content.lower()
regex_matches = []

# Search sanitized text (markup removed) for DLP theme keywords
for key,value in regex_dictionary.items():

  # Get confiiguration settings
  min_matches = value.get('min_matches',1)
  risk = value.get('risk',1)
  enabled = value.get('enabled',False)
  regex_str = value.get('regex','')

  # Fast compute True/False hit for each DLP theme word
  if enabled:
    print "Searching for key : %s" % (key)
    my_regex = re.compile(value.get('regex'))
    hits = my_regex.findall(text_content)

    if len(hits) > 0:
      regex_matches.append((key, risk, len(hits), hits))

# Return array of results (key, risk, number of hits, regex matches)
return regex_matches

def main():


    #print defaults.test_regex.dlp_regex

    text_content = ""

    for line in open('testData.txt'):
        text_content+=line

    for match in regex_content(text_content, test_regex.dlp_regex):
        print "\nFound %s : %s" % (match[0], match[3])

    print "\n"

if __name__ == '__main__':
main()

Run Code Online (Sandbox Code Playgroud)

它正在使用此处找到的正则表达式:

'Large number of US Zip Codes' : { 'regex' : "\b\d{5}(?:-\d{1,4})?\b"},

Run Code Online (Sandbox Code Playgroud)

当我在我的正则表达式中加上'r'标志时,我可以找到我正在寻找的邮政编码,但是我正在搜索的文档中的每个其他5位数字.根据我的理解,这是因为它忽略了\ b字符.但是没有r标志,它找不到任何邮政编码.它在regexr中工作得很好,但在我的代码中却没有.我没有运气\ b字符工作,也没有^和$用于识别我正在搜索的字符串的开头和结尾.我对这些特殊字符的误解是什么？

- 原创帖子 -

我正在编写一个用于识别邮政编码(并且只有邮政编码)的正则表达式,所以为了避免误报我试图在我的正则表达式中包含边界,使用以下两个: