保存为PDF时Web Scraping FileNotFoundError

Question

保存为PDF时Web Scraping FileNotFoundError

FAR*_*RAF 1 python web-scraping python-3.x

我复制一些Python代码,以便从网站下载数据.这是我的具体网站:https: //www.codot.gov/business/bidding/bid-tab-archives/bid-tabs-2017-1

这是我复制的代码:

import requests
from bs4 import BeautifulSoup

def _getUrls_(res):
    hrefs = []
    soup = BeautifulSoup(res.text, 'lxml')
    main_content = soup.find('div',{'id' : 'content-core'})
    table = main_content.find("table")
    for a in table.findAll('a', href=True):
        hrefs.append(a['href'])
    return(hrefs)

bidurl = 'https://www.codot.gov/business/bidding/bid-tab-archives/bid-tabs-2017-1'
r = requests.get(bidurl)
hrefs = _getUrls_(r)

def _getPdfs_(hrefs, basedir):
    for i in range(len(hrefs)):
        print(hrefs[i])
        respdf = requests.get(hrefs[i])
        pdffile = basedir + "/pdf_dot/" + hrefs[i].split("/")[-1] + ".pdf"
        try:
            with open(pdffile, 'wb') as p:
                p.write(respdf.content)
                p.close()
        except FileNotFoundError:
            print("No PDF produced")

basedir= "/Users/ABC/Desktop"
_getPdfs_(hrefs, basedir)

Run Code Online (Sandbox Code Playgroud)

代码运行成功,但它根本没有下载任何东西,即使没有Filenotfounderror明显的.

我尝试了以下两个网址:

https://www.codot.gov/business/bidding/bid-tab-archives/bid-tabs-2017/aqc-088a-035-20360
https://www.codot.gov/business/bidding/bid-tab-archives/bid-tabs-2017/aqc-r100-258-21125

Run Code Online (Sandbox Code Playgroud)

但是这两个URL都返回>>> No PDF produced.

问题是代码成功地为其他人工作和下载,但不是我.

Answer 1

and*_*a-f 5

你的代码工作我刚刚测试过.您需要确保basedir存在,您要将其添加到您的代码中:

if not os.path.exists(basedir):
    os.makedirs(basedir)

Run Code Online (Sandbox Code Playgroud)

归档时间：	6 年，11 月前
查看次数：	80 次
最近记录：	6 年，11 月前