使用正则表达式匹配一个用 beautifulsoup 解析的属性值

Question

你好，来自笨拙的地方，

我正在尝试解析一个论坛。更具体地说，线程的名称。

这些线程由论坛引擎 (vbulletin) 提供，因为这样

<a href="http://www.example.com/showthread.php?t=555555" id="thread_title_555555">NAME OF THE TITLE</a>

使用python和beautifulsoup，我在其他领域取得了成功。但是，我无法使用正则表达式解析“id”属性。我需要解析器的这些行找到每个具有六位数 id 的“a”元素并从中获取文本

像这样的东西

for elements in soup.findAll("a"):
    if re.match("thread_title_", element['id']) is not None:
        print element.text

或在伪python中：

for elements in soup.finAll("a", {"id": "thread_title_".*}):
    print element.text

我尝试了数十种变体，但无济于事。我能做什么？

提前致谢

Answer 1

您可以在调用中将 id 与正则表达式匹配findAll()...

for element in soup.findAll("a", id=re.compile("^thread_title_")):
    print element.text