我有一个刮刀机器人,效果很好。但随着时间的推移,刮擦时速度会下降。我添加了concurrent request, download_delay:0,'AUTOTHROTTLE_ENABLED':False但结果是一样的。它开始时速度很快,但速度会变慢。我想这与缓存有关,但不知道我是否必须清理缓存,或者为什么会这样?代码如下希望听到评论;
import scrapy
from scrapy.crawler import CrawlerProcess
import pandas as pd
import scrapy_xlsx
itemList=[]
class plateScraper(scrapy.Spider):
name = 'scrapePlate'
allowed_domains = ['dvlaregistrations.dvla.gov.uk']
FEED_EXPORTERS = {'xlsx': 'scrapy_xlsx.XlsxItemExporter'}
custom_settings = {'FEED_EXPORTERS' :FEED_EXPORTERS,'FEED_FORMAT': 'xlsx','FEED_URI': 'output_r00.xlsx', 'LOG_LEVEL':'INFO','DOWNLOAD_DELAY': 0,'CONCURRENT_ITEMS':300,'CONCURRENT_REQUESTS':30,'AUTOTHROTTLE_ENABLED':False}
def start_requests(self):
df=pd.read_excel('data.xlsx')
columnA_values=df['PLATE']
for row in columnA_values:
global plate_num_xlsx
plate_num_xlsx=row
base_url =f"https://dvlaregistrations.dvla.gov.uk/search/results.html?search={plate_num_xlsx}&action=index&pricefrom=0&priceto=&prefixmatches=¤tmatches=&limitprefix=&limitcurrent=&limitauction=&searched=true&openoption=&language=en&prefix2=Search&super=&super_pricefrom=&super_priceto="
url=base_url
yield scrapy.Request(url,callback=self.parse, cb_kwargs={'plate_num_xlsx': plate_num_xlsx})
def parse(self, response, plate_num_xlsx=None):
plate = response.xpath('//div[@class="resultsstrip"]/a/text()').extract_first()
price = response.xpath('//div[@class="resultsstrip"]/p/text()').extract_first()
try:
a = plate.replace(" ", "").strip()
if plate_num_xlsx == plate.replace(" ", …Run Code Online (Sandbox Code Playgroud) 因任一错误而遇到麻烦;writer.book=book AttributeError: can't set attribute 'book'或者BadZipFile
对于没有给出 badzipfile 错误的代码,我首先放置了写入 excel 文件的代码行,dataOutput=pd.DataFrame(dictDataOutput,index=[0])
但是,即使我无法摆脱,writer.book = book AttributeError: can't set attribute 'book'正如其中一个答案所示,我需要将 openpyxl 返回到以前的版本,或者使用 CSV 文件而不是 excel。我认为这不是解决方案。应该有我无法进入的解决方案
dataOutput=pd.DataFrame(dictDataOutput,index=[0])
dataOutput.to_excel('output.xlsx') 'output.xlsm'
book = load_workbook('output.xlsx') 'output.xlsm'
writer = pd.ExcelWriter('output.xlsx')OR'output.xlsm'#,engine='openpyxl',mode='a',if_sheet_exists='overlay')
writer.book = book
writer.sheets = {ws.title: ws for ws in book.worksheets}
for sheetname in writer.sheets:
dataOutput.to_excel(writer,sheet_name=sheetname, startrow=writer.sheets[sheetname].max_row, index = False,header= False)
writer.save()
Run Code Online (Sandbox Code Playgroud)
我在此处输入链接描述中寻找答案,并在此处输入链接描述中属性错误的详细解决方案中寻找答案
---我尝试了另一种方法
with pd.ExcelWriter('output.xlsx', mode='a',if_sheet_exists='overlay') as writer:
dataOutput.to_excel(writer, sheet_name='Sheet1')
writer.save()
Run Code Online (Sandbox Code Playgroud)
但是这次又报错了
FutureWarning: save is not part …Run Code Online (Sandbox Code Playgroud)