小编Rag*_*Ram的帖子

错误 403:scrapy 中未处理或不允许 HTTP 状态代码

这是代码,我写来抓取 justdial 网站。

import scrapy
from scrapy.http.request import Request

class JustdialSpider(scrapy.Spider):
    name = 'justdial'
    # handle_httpstatus_list = [400]
    # headers={'User-Agent': "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/22.0.1207.1 Safari/537.1"}
    # handle_httpstatus_list = [403, 404]
    allowed_domains = ['justdial.com']
    start_urls = ['https://www.justdial.com/Delhi-NCR/Chemists/page-1']
    # def  start_requests(self):
    #     # hdef start_requests(self):
    #     headers= {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:48.0) Gecko/20100101 Firefox/48.0'}
    #     for url in self.start_urls:
    #         self.log("I just visited :---------------------------------- "+url)
    #         yield Request(url, headers=headers)
    def parse(self,response):
        self.log("I just visited the …
Run Code Online (Sandbox Code Playgroud)

python http scrapy

4
推荐指数
1
解决办法
8387
查看次数

标签 统计

http ×1

python ×1

scrapy ×1