我正在尝试使用scrapy来抓取一个包含多页信息的网站.
我的代码是:
from scrapy.spider import BaseSpider
from scrapy.selector import Selector
from tcgplayer1.items import Tcgplayer1Item
class MySpider(BaseSpider):
name = "tcg"
allowed_domains = ["http://www.tcgplayer.com/"]
start_urls = ["http://store.tcgplayer.com/magic/journey-into-nyx?PageNumber=1"]
def parse(self, response):
hxs = Selector(response)
titles = hxs.xpath("//div[@class='magicCard']")
for title in titles:
item = Tcgplayer1Item()
item["cardname"] = title.xpath(".//li[@class='cardName']/a/text()").extract()[0]
vendor = title.xpath(".//tr[@class='vendor ']")
item["price"] = vendor.xpath("normalize-space(.//td[@class='price']/text())").extract()
item["quantity"] = vendor.xpath("normalize-space(.//td[@class='quantity']/text())").extract()
item["shipping"] = vendor.xpath("normalize-space(.//span[@class='shippingAmount']/text())").extract()
item["condition"] = vendor.xpath("normalize-space(.//td[@class='condition']/a/text())").extract()
item["vendors"] = vendor.xpath("normalize-space(.//td[@class='seller']/a/text())").extract()
yield item
Run Code Online (Sandbox Code Playgroud)
我试图刮掉所有页面,直到它到达页面的末尾...有时会有比其他页面更多的页面,因此很难准确说出页码的结束位置.
我正在使用40k条目构建此数组.
array = [(value1, value2, value3),(value1, value2, value3),(value1, value2, value3) .... ]
Run Code Online (Sandbox Code Playgroud)
是否可以在python中将其插入到mysql中:
cursor.execute('''INSERT IGNORE into %s VALUES *array here*''' % (table_name, array))
Run Code Online (Sandbox Code Playgroud)
我无法正确地将数组变量传递给mysql.任何帮助赞赏.