Uch*_* AJ 2 python xpath scrapy web-scraping
这是网站中“查看更多”按钮的检查。我可以爬行网站中显示的数据,但我希望它可以爬行隐藏在“查看更多”按钮后面的项目。我怎么做?
<div id="view-more" class="p20px pt10px">
<div id="view-more-loader" class="tac"></div>
<a href="javascript:void(0);" onclick="add_more_product_classified();$('#load_more_a_id').hide();" class="xxxxlarge ffrc lightbginfo gbiwb bdr darkbdrinfo p10px20px db w180px m0a tac" id="load_more_a_id" style="display: block;"><b class="icon-refresh xsmall mr5px"></b>View More Products..</a>
</div>
Run Code Online (Sandbox Code Playgroud)
我的scrapy代码:
import scrapy
class DummymartSpider(scrapy.Spider):
name = 'dummymart'
allowed_domains = ['dummymart.net']
start_urls =['https://www.dummymart.com/catalog/car-dvd-player_cid100001018.html']
def parse(self, response):
Product = response.xpath('//div[@class="attr"]/h2/a/@title').extract()
Company = response.xpath('//div[@class="supplier"]/p/a/@title').extract()
Country = response.xpath('//*[@class="location a-color-secondary"]/span/text()').extract()
Category = response.xpath('//*[@class="attr category hide--mobile"]/span/a/text()').extract()
for item in zip(Product,Company,Country,Category):
scraped_info = {
'Product':item[0],
'Company': item[1],
'Country':item[2],
'Category':item[3]
}
yield scraped_info
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
3990 次 |
最近记录: |