小编nev*_*ter的帖子

Scrapy python csv输出在每行之间有空行

我在生成的csv输出文件中的每行scrapy输出之间得到不需要的空行.

我已经从python2迁移到python 3,并且我使用的是Windows 10.因此我正在调整我的scrapy项目用于python3.

我当前(现在,唯一的)问题是当我将scrapy输出写入CSV文件时,我在每行之间得到一个空行.这里已经在几个帖子中强调了这一点(它与Windows有关),但我无法获得解决方案.

碰巧的是,我还在piplines.py文件中添加了一些代码,以确保csv输出处于给定的列顺序而不是一些随机顺序.因此,我可以使用normal scrapy crawl charleschurch运行此代码而不是scrapy crawl charleschurch -o charleschurch2017xxxx.csv

有谁知道如何在CSV输出中跳过/省略此空白行?

我的pipelines.py代码在下面(我可能不需要该import csv行,但我怀疑我可能会做最后的答案):

# -*- coding: utf-8 -*-

# Define your item pipelines here
#
# Don't forget to add your pipeline to the ITEM_PIPELINES setting
# See: http://doc.scrapy.org/en/latest/topics/item-pipeline.html

import csv
from scrapy import signals
from scrapy.exporters import CsvItemExporter

class CSVPipeline(object):

  def __init__(self):
    self.files = {}

  @classmethod
  def from_crawler(cls, crawler):
    pipeline = cls()
    crawler.signals.connect(pipeline.spider_opened, signals.spider_opened)
    crawler.signals.connect(pipeline.spider_closed, signals.spider_closed)
    return pipeline

  def spider_opened(self, spider):
    file …
Run Code Online (Sandbox Code Playgroud)

python csv scrapy web-scraping

5
推荐指数
1
解决办法
2196
查看次数

如何使用 Pandas 从网站下载 xlsx 文件以另存为数据框

如何下载文件:

COVID-19 数据能够保存其名为Covid-19 - Weekly occurrences数据框的工作表之一。

如果我将其放入浏览器中,该网址就会起作用。

我努力了:

import requests
import io
import pandas as pd    

url = 'https://www.ons.gov.uk/file?uri=%2fpeoplepopulationandcommunity%2fbirthsdeathsandmarriages%2fdeaths%2fdatasets%2fweeklyprovisionalfiguresondeathsregisteredinenglandandwales%2f2020/referencetablescorrected.xlsx'

s=requests.get(url).content
df_deathsAges = pd.read_excel(io.StringIO(s.decode('utf-8')), 
                          nrows = 25, header = 5, sheet_name='Covid-19 - Weekly occurrences')
Run Code Online (Sandbox Code Playgroud)

但我收到错误:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa3 in position 15: invalid start byte

我努力了:

url = 'https://www.ons.gov.uk/file?uri=%2fpeoplepopulationandcommunity%2fbirthsdeathsandmarriages%2fdeaths%2fdatasets%2fweeklyprovisionalfiguresondeathsregisteredinenglandandwales%2f2020/referencetablescorrected.xlsx'

df_deathsAges = pd.read_excel(url,'Covid-19 - Weekly occurrences')
Run Code Online (Sandbox Code Playgroud)

但我收到错误:

HTTPError: HTTP Error 403: Forbidden

完成这项任务的最佳方法是什么?

python-3.x pandas

5
推荐指数
1
解决办法
4783
查看次数

标签 统计

csv ×1

pandas ×1

python ×1

python-3.x ×1

scrapy ×1

web-scraping ×1