Mic*_*hal 5 html python beautifulsoup
我正在使用BeautifulSoup模块以这种方式从html中选择所有href:
def extract_links(html):
soup = BeautifulSoup(html)
anchors = soup.findAll('a')
print anchors
links = []
for a in anchors:
links.append(a['href'])
return links
Run Code Online (Sandbox Code Playgroud)
但有时它失败了这个错误信息:
Traceback (most recent call last):
File "C:\py\main.py", line 33, in <module>
urls = extract_links(page)
File "C:\py\main.py", line 11, in extract_links
links.append(a['href'])
File "C:\py\BeautifulSoup.py", line 601, in __getitem__
return self._getAttrMap()[key]
KeyError: 'href'
Run Code Online (Sandbox Code Playgroud)
并非所有锚标签都具有href属性.在尝试访问该属性之前,应检查锚是否具有href.
if a.has_key('href')
links.append(a['href'])
Run Code Online (Sandbox Code Playgroud)
在这里查看了一些评论后,我认为这是处理这种情况的最pythonic方式.