wde*_*tac 0 python scrapy scrapy-spider
到目前为止对我来说还可以。我的问题是如何进一步抓取此URL列表?搜索后,我知道我可以在解析中返回一个请求,但似乎只能处理一个URL。
这是我的解析:
def parse(self, response):
# Get the list of URLs, for example:
list = ["http://a.com", "http://b.com", "http://c.com"]
return scrapy.Request(list[0])
# It works, but how can I continue b.com and c.com?
Run Code Online (Sandbox Code Playgroud)
我可以那样做吗?
def parse(self, response):
# Get the list of URLs, for example:
list = ["http://a.com", "http://b.com", "http://c.com"]
for link in list:
scrapy.Request(link)
# This is wrong, though I need something like this
Run Code Online (Sandbox Code Playgroud)
完整版本:
import scrapy
class MySpider(scrapy.Spider):
name = "mySpider"
allowed_domains = ["x.com"]
start_urls = ["http://x.com"]
def parse(self, response):
# Get the list of URLs, for example:
list = ["http://a.com", "http://b.com", "http://c.com"]
for link in list:
scrapy.Request(link)
# This is wrong, though I need something like this
Run Code Online (Sandbox Code Playgroud)
我认为您正在寻找的是yield语句:
def parse(self, response):
# Get the list of URLs, for example:
list = ["http://a.com", "http://b.com", "http://c.com"]
for link in list:
request = scrapy.Request(link)
yield request
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
4525 次 |
| 最近记录: |