小编FAR*_*RAF的帖子

保存为PDF时Web Scraping FileNotFoundError

我复制一些Python代码,以便从网站下载数据.这是我的具体网站:https: //www.codot.gov/business/bidding/bid-tab-archives/bid-tabs-2017-1

这是我复制的代码:

import requests
from bs4 import BeautifulSoup

def _getUrls_(res):
    hrefs = []
    soup = BeautifulSoup(res.text, 'lxml')
    main_content = soup.find('div',{'id' : 'content-core'})
    table = main_content.find("table")
    for a in table.findAll('a', href=True):
        hrefs.append(a['href'])
    return(hrefs)

bidurl = 'https://www.codot.gov/business/bidding/bid-tab-archives/bid-tabs-2017-1'
r = requests.get(bidurl)
hrefs = _getUrls_(r)

def _getPdfs_(hrefs, basedir):
    for i in range(len(hrefs)):
        print(hrefs[i])
        respdf = requests.get(hrefs[i])
        pdffile = basedir + "/pdf_dot/" + hrefs[i].split("/")[-1] + ".pdf"
        try:
            with open(pdffile, 'wb') as p:
                p.write(respdf.content)
                p.close()
        except FileNotFoundError:
            print("No PDF produced")

basedir= "/Users/ABC/Desktop"
_getPdfs_(hrefs, …
Run Code Online (Sandbox Code Playgroud)

python web-scraping python-3.x

1
推荐指数
1
解决办法
80
查看次数

标签 统计

python ×1

python-3.x ×1

web-scraping ×1