如何使用正则表达式获取html

pha*_*s15 -2 python regex

#<link rel='canonical' href='http://www.samplewebsite.com/image/5434553/' />

#I am trying to grab the text in href

image = str(Soup)

image_re = re.compile('\<link rel=\'cononical\' href=')

image_pat = re.findall(image_re, image)

print image_pa

#>> []

#Thanks!
Run Code Online (Sandbox Code Playgroud)

jef*_*upp 5

编辑:这使用了BeautifulSoup包,我认为我在此问题的上一个版本中看到过.

编辑:更简单的是:

soup = BeautifulSoup(document)
links = soup.findAll('link', rel='canonical')
for link in links:
    print link['href']
Run Code Online (Sandbox Code Playgroud)

而不是所有,你可以使用:

soup = BeautifulSoup(document)
links = soup("link")
for link in links:
    if "rel" in link and link["rel"] == 'canonical':
        print link["href"]
Run Code Online (Sandbox Code Playgroud)