这是蜘蛛:
import scrapy
from danmurphys.items import DanmurphysItem
class MySpider(scrapy.Spider):
name = 'danmurphys'
allowed_domains = ['danmurphys.com.au']
start_urls = ['https://www.danmurphys.com.au/dm/navigation/navigation_results_gallery.jsp?params=fh_location%3D%2F%2Fcatalog01%2Fen_AU%2Fcategories%3C%7Bcatalog01_2534374302084767_2534374302027742%7D%26fh_view_size%3D120%26fh_sort%3D-sales_value_30_days%26fh_modification%3D&resetnav=false&storeExclusivePage=false']
def parse(self, response):
urls = response.xpath('//h2/a/@href').extract()
for url in urls:
request = scrapy.Request(url , callback=self.parse_page)
yield request
def parse_page(self , response):
item = DanmurphysItem()
item['brand'] = response.xpath('//span[@itemprop="brand"]/text()').extract_first().strip()
item['name'] = response.xpath('//span[@itemprop="name"]/text()').extract_first().strip()
item['url'] = response.url
return item
Run Code Online (Sandbox Code Playgroud)
这是项目:
import scrapy
class DanmurphysItem(scrapy.Item):
brand = scrapy.Field()
name = scrapy.Field()
url = scrapy.Field()
Run Code Online (Sandbox Code Playgroud)
当我用这个命令运行蜘蛛:
scrapy crawl danmurphys -o output.csv
Run Code Online (Sandbox Code Playgroud)
要在Scrapy 1.3中修复此问题,您可以通过在类的方法中添加newline=''as参数来对其进行修补.io.TextIOWrapper__init__CsvItemExporterscrapy.exporters
| 归档时间: |
|
| 查看次数: |
3692 次 |
| 最近记录: |