小编Eli*_*elo的帖子

Pandas通过列将CSV拆分为多个CSV(或DataFrame)

我很遗憾有一个问题,一些帮助或提示将不胜感激.

问题:我有一个csv文件,其列可能有多个值,如:

Fruit;Color;The_evil_column
Apple;Red;something1
Apple;Green;something1
Orange;Orange;something1
Orange;Green;something2
Apple;Red;something2
Apple;Red;something3

Run Code Online (Sandbox Code Playgroud)

我已将数据加载到数据帧中,我需要根据"The_evil_column"列的值将该数据帧拆分为多个数据帧:

df1
Fruit;Color;The_evil_column
Apple;Red;something1
Apple;Green;something1
Orange;Orange;something1

df2
Fruit;Color;The_evil_column
Orange;Green;something2
Apple;Red;something2

df3
Fruit;Color;The_evil_column
Apple;Red;something3

Run Code Online (Sandbox Code Playgroud)

阅读一些帖子后我更加困惑,我需要一些关于此的提示.

python csv python-2.7 pandas pandas-groupby

Eli*_*elo

2017 12-28

4
推荐指数

1
解决办法

4402
查看次数

Scrapy 检测 Xpath 是否不存在

我一直在尝试制作我的第一个爬虫，我已经完成了我需要的东西（获取 1º 商店和 2º 商店的运输信息和价格）但是使用 2 个爬虫而不是 1 个，因为我在这里有一个很大的塞子。

当有超过 1 个商店时，输出结果为：

In [1]: response.xpath('//li[@class="container list-display-box__list__container"]/div/div/div/div/div[@class="shipping"]/p//text()').extract()
Out[1]: 
[u'ENV\xcdO 3,95\u20ac ',
 u'ENV\xcdO GRATIS',
 u'ENV\xcdO GRATIS',
 u'ENV\xcdO 4,95\u20ac ']

Run Code Online (Sandbox Code Playgroud)

为了只获得我正在使用的第二个结果：

In [2]: response.xpath('//li[@class="container list-display-box__list__container"]/div/div/div/div/div[@class="shipping"]/p//text()')[1].extract()
Out[2]: u'ENV\xcdO GRATIS'

Run Code Online (Sandbox Code Playgroud)

但是当没有第二个结果（只有 1 个商店）时，我得到：

IndexError: list index out of range

Run Code Online (Sandbox Code Playgroud)

即使其他项目有数据，爬虫也会跳过整个页面......

在尝试了几次之后，我决定做一个快速的解决方案来获得结果，2 个爬虫 1 个用于第一家商店，另一个用于第二家，但现在我只想用 1 个履带式清洁。

一些帮助，提示或建议将不胜感激，这是我第一次尝试使用scrapy制作递归爬虫，有点像它。

有代码：

# -*- coding: utf-8 -*-
import scrapy
from Guapalia.items import GuapaliaItem
from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor
class GuapaliaSpider(CrawlSpider):
    name = "guapalia"
    allowed_domains = …

Run Code Online (Sandbox Code Playgroud)

xpath web-crawler scrapy web-scraping python-2.7

Eli*_*elo

2017 09-21

1
推荐指数

1
解决办法

2252
查看次数