如何使用 Pandas 从网站下载 xlsx 文件以另存为数据框

nev*_*ter 5 python-3.x pandas

如何下载文件:

COVID-19 数据能够保存其名为Covid-19 - Weekly occurrences数据框的工作表之一。

如果我将其放入浏览器中,该网址就会起作用。

我努力了:

import requests
import io
import pandas as pd    

url = 'https://www.ons.gov.uk/file?uri=%2fpeoplepopulationandcommunity%2fbirthsdeathsandmarriages%2fdeaths%2fdatasets%2fweeklyprovisionalfiguresondeathsregisteredinenglandandwales%2f2020/referencetablescorrected.xlsx'

s=requests.get(url).content
df_deathsAges = pd.read_excel(io.StringIO(s.decode('utf-8')), 
                          nrows = 25, header = 5, sheet_name='Covid-19 - Weekly occurrences')
Run Code Online (Sandbox Code Playgroud)

但我收到错误:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa3 in position 15: invalid start byte

我努力了:

url = 'https://www.ons.gov.uk/file?uri=%2fpeoplepopulationandcommunity%2fbirthsdeathsandmarriages%2fdeaths%2fdatasets%2fweeklyprovisionalfiguresondeathsregisteredinenglandandwales%2f2020/referencetablescorrected.xlsx'

df_deathsAges = pd.read_excel(url,'Covid-19 - Weekly occurrences')
Run Code Online (Sandbox Code Playgroud)

但我收到错误:

HTTPError: HTTP Error 403: Forbidden

完成这项任务的最佳方法是什么?

fog*_*rit 5

xlsx是二进制格式,它不是有效的 UTF-8。尝试将其作为二进制文件流加载到 pandas 中:

import requests
import io
import pandas as pd    

url = 'https://www.ons.gov.uk/file?uri=%2fpeoplepopulationandcommunity%2fbirthsdeathsandmarriages%2fdeaths%2fdatasets%2fweeklyprovisionalfiguresondeathsregisteredinenglandandwales%2f2020/referencetablescorrected.xlsx'

s=requests.get(url).content
df_deathsAges = pd.read_excel(io.BytesIO(s),
                          nrows = 25, header = 5, sheet_name='Covid-19 - Weekly occurrences', engine="openpyxl")
Run Code Online (Sandbox Code Playgroud)

注意:我测试了代码,无法xlsx使用默认引擎读取文件xlrd,但成功了openpyxl