使用POST方法进行草率的制作请求

And*_*owa 2 python request scrapy

我正在尝试使用Scrapy 从“ http://eastasiaeg.com/en/laptop-in-egypt ”中抓取产品列表。

部分产品动态加载,并尝试构建Scrapy请求。但是,这是有问题的。请帮助。

# -*- coding: utf-8 -*-
import scrapy

from v4.items import Product


class IntelEGEastasiaegComSpider(scrapy.Spider):
    name = "intel_eg_eastasiaeg_com_py"

    start_urls = [
            'http://eastasiaeg.com/en/laptop-in-egypt'
        ]

    def start_requests(self):
        request_body = {"categoryId":"3","manufacturerId":"0","vendorId":"0","priceRangeFilterModel7Spikes":{"CategoryId":"3","ManufacturerId":"0","VendorId":"0","SelectedPriceRange":{},"MinPrice":"2400","MaxPrice":"44625"},"specificationFiltersModel7Spikes":{"CategoryId":"3","ManufacturerId":"0","VendorId":"0","SpecificationFilterGroups":[{"Id":"27","FilterItems":[{"Id":"103","FilterItemState":"Unchecked"},{"Id":"104","FilterItemState":"Unchecked"},{"Id":"105","FilterItemState":"Unchecked"},{"Id":"110","FilterItemState":"Unchecked"}]},{"Id":"11","FilterItems":[{"Id":"302","FilterItemState":"Unchecked"},{"Id":"75","FilterItemState":"Unchecked"}]},{"Id":"6","FilterItems":[{"Id":"21","FilterItemState":"Unchecked"},{"Id":"24","FilterItemState":"Unchecked"},{"Id":"25","FilterItemState":"Unchecked"},{"Id":"26","FilterItemState":"Unchecked"}]},{"Id":"5","FilterItems":[{"Id":"1069","FilterItemState":"Unchecked"},{"Id":"1078","FilterItemState":"Unchecked"},{"Id":"1118","FilterItemState":"Unchecked"},{"Id":"1862","FilterItemState":"Unchecked"}]},{"Id":"2","FilterItems":[{"Id":"8","FilterItemState":"Unchecked"},{"Id":"10","FilterItemState":"Unchecked"},{"Id":"1451","FilterItemState":"Unchecked"},{"Id":"1119","FilterItemState":"Unchecked"}]},{"Id":"8","FilterItems":[{"Id":"61","FilterItemState":"Unchecked"},{"Id":"62","FilterItemState":"Unchecked"},{"Id":"63","FilterItemState":"Unchecked"}]},{"Id":"333","FilterItems":[{"Id":"2460","FilterItemState":"Unchecked"}]}]},"attributeFiltersModel7Spikes":"null","manufacturerFiltersModel7Spikes":{"CategoryId":"3","ManufacturerFilterItems":[{"Id":"2","FilterItemState":"Unchecked"},{"Id":"1","FilterItemState":"Unchecked"},{"Id":"3","FilterItemState":"Unchecked"},{"Id":"6","FilterItemState":"Unchecked"}]},"vendorFiltersModel7Spikes":"null","pageNumber":"2","orderby":"10","viewmode":"grid","pagesize":"null","queryString":"","shouldNotStartFromFirstPage":"true","onSaleFilterModel":"null","keyword":"","searchCategoryId":"0","searchManufacturerId":"0","priceFrom":"","priceTo":"","includeSubcategories":"False","searchInProductDescriptions":"False","advancedSearch":"False","isOnSearchPage":"False"}
        for body in request_body:
            request_body = body
            yield scrapy.Request('http://eastasiaeg.com/en/getFilteredProducts',
                                 method="POST",
                                 body=request_body,
                                 callback=self.parse,
                                 headers={'Content-type': 'application/json; charset=UTF-8'}, )

    def parse(self, response):
        print response.body
Run Code Online (Sandbox Code Playgroud)

Gra*_*rus 6

scrapy.FormRequest当您要使用其中的表单数据进行POST请求时,应使用。

def start_requests(self):
    form_data = {}  # your formdata
    yield scrapy.FormRequest(url, formdata=form_data)
Run Code Online (Sandbox Code Playgroud)

您的方法也可以工作,但是for循环在这里没有多大意义。
for body in request_body:遍历您的字典中的键,request_body您基本上只用一个键就可以发出24个请求。
所以scrapy.Request尝试一下:

def start_requests(self): 
    form_data = {}  # your formdata
    # Request only takes string as body so you need to
    # convert python dict to string
    request_body = json.dumps(form_data)
    yield scrapy.Request('http://eastasiaeg.com/en/getFilteredProducts',
                         method="POST",
                         body=request_body,
                         headers={'Content-Type': 'application/json; charset=UTF-8'}, )
    # Usually Content-Type matters here a lot.
Run Code Online (Sandbox Code Playgroud)

PS scrapy请求默认为self.parse回调,因此您无需指定它。