#<link rel='canonical' href='http://www.samplewebsite.com/image/5434553/' />
#I am trying to grab the text in href
image = str(Soup)
image_re = re.compile('\<link rel=\'cononical\' href=')
image_pat = re.findall(image_re, image)
print image_pa
#>> []
#Thanks!
Run Code Online (Sandbox Code Playgroud)
编辑:这使用了BeautifulSoup包,我认为我在此问题的上一个版本中看到过.
编辑:更简单的是:
soup = BeautifulSoup(document)
links = soup.findAll('link', rel='canonical')
for link in links:
print link['href']
Run Code Online (Sandbox Code Playgroud)
而不是所有,你可以使用:
soup = BeautifulSoup(document)
links = soup("link")
for link in links:
if "rel" in link and link["rel"] == 'canonical':
print link["href"]
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
890 次 |
| 最近记录: |