小编xlm*_*ter的帖子

python scrapy 在解析时速度变慢

我有一个刮刀机器人,效果很好。但随着时间的推移,刮擦时速度会下降。我添加了concurrent request, download_delay:0'AUTOTHROTTLE_ENABLED':False但结果是一样的。它开始时速度很快,但速度会变慢。我想这与缓存有关,但不知道我是否必须清理缓存,或者为什么会这样?代码如下希望听到评论;

import scrapy
from scrapy.crawler import CrawlerProcess
import pandas as pd
import scrapy_xlsx

itemList=[]
class plateScraper(scrapy.Spider):
    name = 'scrapePlate'
    allowed_domains = ['dvlaregistrations.dvla.gov.uk']
    FEED_EXPORTERS = {'xlsx': 'scrapy_xlsx.XlsxItemExporter'}
    custom_settings = {'FEED_EXPORTERS' :FEED_EXPORTERS,'FEED_FORMAT': 'xlsx','FEED_URI': 'output_r00.xlsx', 'LOG_LEVEL':'INFO','DOWNLOAD_DELAY': 0,'CONCURRENT_ITEMS':300,'CONCURRENT_REQUESTS':30,'AUTOTHROTTLE_ENABLED':False}

    def start_requests(self):
        df=pd.read_excel('data.xlsx')
        columnA_values=df['PLATE']
        for row in columnA_values:
            global  plate_num_xlsx
            plate_num_xlsx=row
            base_url =f"https://dvlaregistrations.dvla.gov.uk/search/results.html?search={plate_num_xlsx}&action=index&pricefrom=0&priceto=&prefixmatches=&currentmatches=&limitprefix=&limitcurrent=&limitauction=&searched=true&openoption=&language=en&prefix2=Search&super=&super_pricefrom=&super_priceto="
            url=base_url
            yield scrapy.Request(url,callback=self.parse, cb_kwargs={'plate_num_xlsx': plate_num_xlsx})

    def parse(self, response, plate_num_xlsx=None):
        plate = response.xpath('//div[@class="resultsstrip"]/a/text()').extract_first()
        price = response.xpath('//div[@class="resultsstrip"]/p/text()').extract_first()

        try:
            a = plate.replace(" ", "").strip()
            if plate_num_xlsx == plate.replace(" ", …
Run Code Online (Sandbox Code Playgroud)

python caching scrapy slowdown

5
推荐指数
1
解决办法
974
查看次数

pandas 附加 excel(xlsx) 文件给出 attribute.error

因任一错误而遇到麻烦;writer.book=book AttributeError: can't set attribute 'book'或者BadZipFile

对于没有给出 badzipfile 错误的代码,我首先放置了写入 excel 文件的代码行,dataOutput=pd.DataFrame(dictDataOutput,index=[0]) 但是,即使我无法摆脱,writer.book = book AttributeError: can't set attribute 'book'正如其中一个答案所示,我需要将 openpyxl 返回到以前的版本,或者使用 CSV 文件而不是 excel。我认为这不是解决方案。应该有我无法进入的解决方案

dataOutput=pd.DataFrame(dictDataOutput,index=[0])
dataOutput.to_excel('output.xlsx') 'output.xlsm'
book = load_workbook('output.xlsx') 'output.xlsm'
writer = pd.ExcelWriter('output.xlsx')OR'output.xlsm'#,engine='openpyxl',mode='a',if_sheet_exists='overlay')
writer.book = book
writer.sheets = {ws.title: ws for ws in book.worksheets}

for sheetname in writer.sheets:
    dataOutput.to_excel(writer,sheet_name=sheetname, startrow=writer.sheets[sheetname].max_row, index = False,header= False)

writer.save()
Run Code Online (Sandbox Code Playgroud)

我在此处输入链接描述中寻找答案,并在此处输入链接描述中属性错误的详细解决方案中寻找答案

---我尝试了另一种方法

with pd.ExcelWriter('output.xlsx', mode='a',if_sheet_exists='overlay') as writer:
    dataOutput.to_excel(writer, sheet_name='Sheet1')
    writer.save()
Run Code Online (Sandbox Code Playgroud)

但是这次又报错了

FutureWarning: save is not part …
Run Code Online (Sandbox Code Playgroud)

python attributeerror pandas openpyxl

3
推荐指数
1
解决办法
5421
查看次数

标签 统计

python ×2

attributeerror ×1

caching ×1

openpyxl ×1

pandas ×1

scrapy ×1

slowdown ×1