抓取-使用第一个网址的结果抓取多个网址

wde*_*tac 0 python scrapy scrapy-spider

  1. 我使用Scrapy从第一个URL抓取数据。
  2. 第一个URL返回一个包含URL列表的响应。

到目前为止对我来说还可以。我的问题是如何进一步抓取此URL列表?搜索后,我知道我可以在解析中返回一个请求,但似乎只能处理一个URL。

这是我的解析:

def parse(self, response):
    # Get the list of URLs, for example:
    list = ["http://a.com", "http://b.com", "http://c.com"]
    return scrapy.Request(list[0])
    # It works, but how can I continue b.com and c.com?
Run Code Online (Sandbox Code Playgroud)

我可以那样做吗?

def parse(self, response):
    # Get the list of URLs, for example:
    list = ["http://a.com", "http://b.com", "http://c.com"]

    for link in list:
        scrapy.Request(link)
        # This is wrong, though I need something like this
Run Code Online (Sandbox Code Playgroud)

完整版本:

import scrapy

class MySpider(scrapy.Spider):
    name = "mySpider"
    allowed_domains = ["x.com"]
    start_urls = ["http://x.com"]

    def parse(self, response):
        # Get the list of URLs, for example:
        list = ["http://a.com", "http://b.com", "http://c.com"]

        for link in list:
            scrapy.Request(link)
            # This is wrong, though I need something like this
Run Code Online (Sandbox Code Playgroud)

Fra*_*tin 5

我认为您正在寻找的是yield语句:

def parse(self, response):
    # Get the list of URLs, for example:
    list = ["http://a.com", "http://b.com", "http://c.com"]

    for link in list:
        request = scrapy.Request(link)
        yield request
Run Code Online (Sandbox Code Playgroud)