Vor*_*Vor 2 python pyqt beautifulsoup
我正在使用 PyQt4 和 BeautifulSoup 编写小脚本。基本上,您指定 url 和脚本,以便从网页下载所有图片。
在输出中,当我提供http://yahoo.com时,它会下载除一张之外的所有图片:
...
Download Complete
Download Complete
File name is wrong
Traceback (most recent call last):
File "./picture_downloader.py", line 41, in loadComplete
self.download_image()
File "./picture_downloader.py", line 58, in download_image
print 'File name is wrong ',image['src']
File "/usr/local/lib/python2.7/dist-packages/beautifulsoup4-4.1.3-py2.7.egg/bs4/element.py", line 879, in __getitem__
return self.attrs[key]
KeyError: 'src'
Run Code Online (Sandbox Code Playgroud)
Download Complete
File name is wrong h
Download Complete
Run Code Online (Sandbox Code Playgroud)
最后,这是代码的一部分:
# SLOT for loadFinished
def loadComplete(self):
self.download_image()
def download_image(self):
html = unicode(self.frame.toHtml()).encode('utf-8')
soup = bs(html)
for image in soup.findAll('img'):
try:
file_name = image['src'].split('/')[-1]
cur_path = os.path.abspath(os.curdir)
if not os.path.exists(os.path.join(cur_path, 'images/')):
os.makedirs(os.path.join(cur_path, 'images/'))
f_path = os.path.join(cur_path, 'images/%s' % file_name)
urlretrieve(image['src'], f_path)
print "Download Complete"
except:
print 'File name is wrong ',image['src']
print "No more pictures on the page"
Run Code Online (Sandbox Code Playgroud)
这意味着该image元素没有属性"src",并且您会收到相同的错误两次:一次是file_name = image['src'].split('/')[-1]在 except 块中,一次是在 except 块中'File name is wrong ',image['src']。
避免该问题的最简单方法是替换soup.findAll('img')为soup.findAll('img',{"src":True}),这样它只会找到具有属性的元素src。
如果有两种可能性,请尝试以下操作:
for image in soup.findAll('img'):
v = image.get('src', image.get('dfr-src')) # get's "src", else "dfr_src"
# if both are missing - None
if v is None:
continue # continue loop with the next image
# do your stuff
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
12918 次 |
| 最近记录: |