beautifulsoup无法使用正则表达式在文件中找到href

Question

beautifulsoup无法使用正则表达式在文件中找到href

我有一个像以下html文件:

<form action="/2811457/follow?gsid=3_5bce9b871484d3af90c89f37" method="post">
<div>
<a href="/2811457/follow?page=2&amp;gsid=3_5bce9b871484d3af90c89f37">next_page</a>
&nbsp;<input name="mp" type="hidden" value="3" />
<input type="text" name="page" size="2" style='-wap-input-format: "*N"' />
<input type="submit" value="jump" />&nbsp;1/3
</div>
</form>

Run Code Online (Sandbox Code Playgroud)

如何在next_page中提取href""/ 2811457/follow？page = 2&gsid = 3_5bce9b871484d3af90c89f37"？

这是html的一部分,我打算说清楚.当我使用beautifulsoup时,

print soup.find('a',href=re.compile('follow?page'))

Run Code Online (Sandbox Code Playgroud)

它返回无,为什么？我是beautifulsoup的新手,我看过文档,但仍然感到困惑.

现在我用一种丑陋的方式:

    urls = soup.findAll('a',href=True))
    for url in urls:
        if follow?page in url:
            print url

Run Code Online (Sandbox Code Playgroud)

我需要一种更清晰优雅的方式.

Answer 1

Mar*_*ers 16

你需要逃避问号.正则表达式w?意味着zero or one w.试试这个:

print soup.find('a', href = re.compile(r'.*follow\?page.*'))

Run Code Online (Sandbox Code Playgroud)

归档时间：	13 年，5 月前
查看次数：	6128 次
最近记录：	13 年，5 月前