FAR*_*RAF 1 python web-scraping python-3.x
我复制一些Python代码,以便从网站下载数据.这是我的具体网站:https: //www.codot.gov/business/bidding/bid-tab-archives/bid-tabs-2017-1
这是我复制的代码:
import requests
from bs4 import BeautifulSoup
def _getUrls_(res):
hrefs = []
soup = BeautifulSoup(res.text, 'lxml')
main_content = soup.find('div',{'id' : 'content-core'})
table = main_content.find("table")
for a in table.findAll('a', href=True):
hrefs.append(a['href'])
return(hrefs)
bidurl = 'https://www.codot.gov/business/bidding/bid-tab-archives/bid-tabs-2017-1'
r = requests.get(bidurl)
hrefs = _getUrls_(r)
def _getPdfs_(hrefs, basedir):
for i in range(len(hrefs)):
print(hrefs[i])
respdf = requests.get(hrefs[i])
pdffile = basedir + "/pdf_dot/" + hrefs[i].split("/")[-1] + ".pdf"
try:
with open(pdffile, 'wb') as p:
p.write(respdf.content)
p.close()
except FileNotFoundError:
print("No PDF produced")
basedir= "/Users/ABC/Desktop"
_getPdfs_(hrefs, basedir)
Run Code Online (Sandbox Code Playgroud)
代码运行成功,但它根本没有下载任何东西,即使没有Filenotfounderror明显的.
我尝试了以下两个网址:
https://www.codot.gov/business/bidding/bid-tab-archives/bid-tabs-2017/aqc-088a-035-20360
https://www.codot.gov/business/bidding/bid-tab-archives/bid-tabs-2017/aqc-r100-258-21125
Run Code Online (Sandbox Code Playgroud)
但是这两个URL都返回>>> No PDF produced.
问题是代码成功地为其他人工作和下载,但不是我.
你的代码工作我刚刚测试过.您需要确保basedir存在,您要将其添加到您的代码中:
if not os.path.exists(basedir):
os.makedirs(basedir)
Run Code Online (Sandbox Code Playgroud)