小编Pau*_*ack的帖子

在循环中使用Scrapy Itemloader

我想在他们的教程中使用的Dmoz网站上使用Scrapy,而不是仅仅通过使用书籍URL(http://www.dmoz.org/Computers/Programming/Languages/Python/Books/)阅读书籍项目/字段对,我想创建一个将读取所需值(名称,标题,描述)的Itemloader.

这是我的items.py文件:

from scrapy.item import Item, Field
from scrapy.contrib.loader import ItemLoader
from scrapy.contrib.loader.processor import Identity


class DmozItem(Item):
    title = Field(
        output_processor=Identity()
        )
    link = Field(
        output_processor=Identity()
        )
    desc = Field(
        output_processor=Identity()
        )


class MainItemLoader(ItemLoader):
    default_item_class = DmozItem
    default_output_processor = Identity()
Run Code Online (Sandbox Code Playgroud)

我的蜘蛛文件:

import scrapy
from scrapy.spiders import Spider
from scrapy.loader import ItemLoader
from tutorial.items import MainItemLoader, DmozItem 
from scrapy.selector import Selector


class DmozSpider(Spider):
    name = 'dmoz'
    allowed_domains = ["dmoz.org"]
    start_urls = [
        "http://www.dmoz.org/Computers/Programming/Languages/Python/Books/"
    ]

    def parse(self, response):
        for sel …
Run Code Online (Sandbox Code Playgroud)

python scrapy web-scraping

8
推荐指数
1
解决办法
1870
查看次数

标签 统计

python ×1

scrapy ×1

web-scraping ×1