抓取-使用第一个网址的结果抓取多个网址

Question

抓取-使用第一个网址的结果抓取多个网址

我使用Scrapy从第一个URL抓取数据。
第一个URL返回一个包含URL列表的响应。

到目前为止对我来说还可以。我的问题是如何进一步抓取此URL列表？搜索后，我知道我可以在解析中返回一个请求，但似乎只能处理一个URL。

这是我的解析：

def parse(self, response):
    # Get the list of URLs, for example:
    list = ["http://a.com", "http://b.com", "http://c.com"]
    return scrapy.Request(list[0])
    # It works, but how can I continue b.com and c.com?

Run Code Online (Sandbox Code Playgroud)

我可以那样做吗？

def parse(self, response):
    # Get the list of URLs, for example:
    list = ["http://a.com", "http://b.com", "http://c.com"]

    for link in list:
        scrapy.Request(link)
        # This is wrong, though I need something like this

Run Code Online (Sandbox Code Playgroud)

完整版本：

import scrapy

class MySpider(scrapy.Spider):
    name = "mySpider"
    allowed_domains = ["x.com"]
    start_urls = ["http://x.com"]

    def parse(self, response):
        # Get the list of URLs, for example:
        list = ["http://a.com", "http://b.com", "http://c.com"]

        for link in list:
            scrapy.Request(link)
            # This is wrong, though I need something like this

Run Code Online (Sandbox Code Playgroud)

Answer 1

Fra*_*tin 5

我认为您正在寻找的是yield语句：

def parse(self, response):
    # Get the list of URLs, for example:
    list = ["http://a.com", "http://b.com", "http://c.com"]

    for link in list:
        request = scrapy.Request(link)
        yield request

Run Code Online (Sandbox Code Playgroud)

归档时间：	10 年，10 月前
查看次数：	4525 次
最近记录：	10 年，10 月前