re.findall() 与它的发现不匹配

Jay*_*sby 3 python regex string

我有一个文本文件,其中包含如下行:

<pattern number=1 theme=pseudo>
<pattern number=2 theme=foo>
<pattern number=3 theme=bar>
Run Code Online (Sandbox Code Playgroud)

我正在使用此函数选择一条随机线:

def find_random_pattern(theme):
    found_lines = []
    pattern = open('poempatterns.txt','r')
    for line in pattern:
        found = re.findall("theme="+theme,line)
        for match in found:
            found_lines.append(line)

    selectedline = random.choice(found_lines)
    return selectedline
Run Code Online (Sandbox Code Playgroud)

假设它返回了<pattern number=1 theme=pseudo>

当我用这个条件检查它时,它返回False

if find_random_pattern("pseudo") == "<pattern number=1 theme=pseudo>":
    return True
else:
    return False
Run Code Online (Sandbox Code Playgroud)

为什么这两个字符串不匹配?

mol*_*are 5

您期望re.findall返回整行,但它返回匹配的部分:

>>> line = "<pattern number=1 theme=pseudo>"
>>> import re
>>> re.findall("theme=pseudo", line)
['theme=pseudo']
Run Code Online (Sandbox Code Playgroud)

我建议您使用与整行匹配的模式,例如:

>>> re.findall(".*theme=pseudo.*", line)
['<pattern number=1 theme=pseudo>']
Run Code Online (Sandbox Code Playgroud)

您的最终代码将如下所示:

def find_random_pattern(theme):
    found_lines = []
    pattern = open('poempatterns.txt','r')
    for line in pattern:
        # Call strip() just in case you have some blank spaces or \n at the end
        found = re.findall(".*theme=%s.*" % theme, line.strip())
        for match in found:
            found_lines.append(line)

    selectedline = random.choice(found_lines)
    return selectedline
Run Code Online (Sandbox Code Playgroud)

更简洁的解决方案是:

import random
import re
import string

def find_random_pattern(theme):
    lines = open('poempatterns.txt','r').readlines()
    stripped_lines = map(string.strip, lines)
    found_lines = filter(lambda l: re.match(".*theme=%s.*" % theme, l), stripped_lines)
    return random.choice(found_lines)
Run Code Online (Sandbox Code Playgroud)