为什么我的Scrapy代码返回一个空数组？

Question

为什么我的Scrapy代码返回一个空数组？

Ine*_*elp 1 python xpath scrapy web-scraping scrapy-spider

我正在为wunderground.com构建网络刮板，但是我的代码返回了inchs_rain和湿度的“ []”值。谁能知道为什么会这样吗？

# -*- coding: utf-8 -*-
import scrapy
from scrapy.selector import Selector
import time

from wunderground_scraper.items import WundergroundScraperItem


class WundergroundComSpider(scrapy.Spider):
    name = "wunderground"
    allowed_domains = ["www.wunderground.com"]
    start_urls = (
        'http://www.wunderground.com/q/zmw:10001.5.99999',
    )

    def parse(self, response):
        info_set = Selector(response).xpath('//div[@id="current"]')
        list = []
        for i in info_set:
            item = WundergroundScraperItem()
            item['description'] = i.xpath('div/div/div/div/span/text()').extract()
            item['description'] = item['description'][0]
            item['humidity'] = i.xpath('div/table/tbody/tr/td/span/span/text()').extract()
            item['inches_rain'] = i.xpath('div/table/tbody/tr/td/span/span/text()').extract()
            list.append(item)
        return list

Run Code Online (Sandbox Code Playgroud)

我也知道湿度和inches_rain项目设置为相同的xpath，但这应该是正确的，因为一旦信息进入数组，我就将它们设置为数组中的某些值。

Answer 1

ale*_*cxe 5

让我建议一个更可靠，更易读的XPath，以作为示例，定位“ Humidity”值，其基础是“ Humidity”列标签：

"".join(i.xpath('.//td[dfn="Humidity"]/following-sibling::td//text()').extract()).strip()

Run Code Online (Sandbox Code Playgroud)

现在输出45％。

仅供参考，您的XPath至少有一个问题- tbody标记-从XPath表达式中将其删除。

归档时间：	10 年，10 月前
查看次数：	460 次
最近记录：	9 年，8 月前