使用 python 下载带有 URL 的文件

pyt*_*er_ 4 python url beautifulsoup

我想使用 python 下载以下网址中的文件。我尝试使用以下代码,但似乎不起作用。我认为错误在于文件格式。如果您能建议对代码进行修改或我可以用于此目的的新代码,我将很高兴

网站链接

https://www.gov.uk/government/statistics/transport-use-during-the-coronavirus-covid-19-pandemic

需要下载地址

https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/959864/COVID-19-transport-use-statistics.ods

我的代码

from urllib import request


response = request.urlopen("https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/959864/COVID-19-transport-use-statistics.ods")
csv = response.read()


csvstr = str(csv).strip("b'")

lines = csvstr.split("\\n")
f = open("historical.csv", "w")
for line in lines:
   f.write(line + "\n")
f.close()
Run Code Online (Sandbox Code Playgroud)

这里基本上我只想下载文件。我听说 Beautifulsoup 可以用于此目的,但我对此没有太多经验。任何能够满足我的目的的代码都将受到高度赞赏

谢谢

小智 5

下载文件:

In [1]: import requests

In [2]: url = 'https://assets.publishing.service.gov.uk/government/uploads/syste
   ...: m/uploads/attachment_data/file/959864/COVID-19-transport-use-statistics.
   ...: ods'

In [3]: with open('COVID-19-transport-use-statistics.ods', 'wb') as out_file:
   ...:     content = requests.get(url, stream=True).content
   ...:     out_file.write(content)
Run Code Online (Sandbox Code Playgroud)

然后您可以使用pandas-ods-reader通过运行以下命令来读取文件:

pip install pandas-ods-reader
Run Code Online (Sandbox Code Playgroud)

然后:

In [4]: from pandas_ods_reader import read_ods

In [5]: df = read_ods('COVID-19-transport-use-statistics.ods', 1)

In [6]: df
Out[6]: 
                   Department for Transport statistics  ...   unnamed.9
0    https://www.gov.uk/government/statistics/trans...  ...        None
1                                                 None  ...        None
2    Use of transport modes: Great Britain, since 1...  ...        None
3    Figures are percentages of an equivalent day o...  ...        None
4                                                 None  ...  Percentage
..                                                 ...  ...         ...
390                  Transport for London Tube and Bus  ...        None
391                               Buses (excl. London)  ...        None
392                                           Cycling   ...        None
393                                  Any other queries  ...        None
394                                    Media enquiries  ...        None
Run Code Online (Sandbox Code Playgroud)

如果您想要使用的话,您可以将其保存为 csvdf.to_csv('my_data.csv', index=False)