Gam*_*fs2 26 python python-requests
我需要下载一个相当大的(~200MB)文件.我想出了如何在这里下载和保存文件.有一个进度条可以知道下载了多少,这将是一件好事.我找到了ProgressBar,但我不确定如何将两者合并在一起.
这是我尝试过的代码,但它没有用.
bar = progressbar.ProgressBar(max_value=progressbar.UnknownLength)
with closing(download_file()) as r:
for i in range(20):
bar.update(i)
Run Code Online (Sandbox Code Playgroud)
leo*_*ovp 62
我建议你试试tqdm[1],它很容易使用.下载requests库[2]的示例代码:
from tqdm import tqdm
import requests
url = "http://www.ovh.net/files/10Mb.dat" #big file test
# Streaming, so we can iterate over the response.
r = requests.get(url, stream=True)
# Total size in bytes.
total_size = int(r.headers.get('content-length', 0))
block_size = 1024 #1 Kibibyte
t=tqdm(total=total_size, unit='iB', unit_scale=True)
with open('test.dat', 'wb') as f:
for data in r.iter_content(block_size):
t.update(len(data))
f.write(data)
t.close()
if total_size != 0 and t.n != total_size:
print("ERROR, something went wrong")
Run Code Online (Sandbox Code Playgroud)
[1]:https://github.com/tqdm/tqdm
[2]:http://docs.python-requests.org/en/master/
Mik*_*ike 31
该tqdm软件包现在包含一个专门针对此类情况设计的功能:wrapattr。您只需包装对象的read(或write) 属性,tqdm 就会处理其余的事情;没有混乱的块大小或类似的东西。这是一个简单的下载功能,将其全部组合在一起requests:
def download(url, filename):
import functools
import pathlib
import shutil
import requests
from tqdm.auto import tqdm
r = requests.get(url, stream=True, allow_redirects=True)
if r.status_code != 200:
r.raise_for_status() # Will only raise for 4xx codes, so...
raise RuntimeError(f"Request to {url} returned status code {r.status_code}")
file_size = int(r.headers.get('Content-Length', 0))
path = pathlib.Path(filename).expanduser().resolve()
path.parent.mkdir(parents=True, exist_ok=True)
desc = "(Unknown total file size)" if file_size == 0 else ""
r.raw.read = functools.partial(r.raw.read, decode_content=True) # Decompress if needed
with tqdm.wrapattr(r.raw, "read", total=file_size, desc=desc) as r_raw:
with path.open("wb") as f:
shutil.copyfileobj(r_raw, f)
return path
Run Code Online (Sandbox Code Playgroud)
还可以使用python库enlight ,它功能强大,提供彩色进度条,并且可以在Linux、Windows下正确运行。
下面是代码+实时截屏。该代码可以在 repl.it 上运行。
import math
import requests, enlighten
url = 'https://upload.wikimedia.org/wikipedia/commons/a/ae/Arthur_Streeton_-_Fire%27s_on_-_Google_Art_Project.jpg?download'
fname = 'image.jpg'
# Should be one global variable
MANAGER = enlighten.get_manager()
r = requests.get(url, stream = True)
assert r.status_code == 200, r.status_code
dlen = int(r.headers.get('Content-Length', '0')) or None
with MANAGER.counter(color = 'green', total = dlen and math.ceil(dlen / 2 ** 20), unit = 'MiB', leave = False) as ctr, \
open(fname, 'wb', buffering = 2 ** 24) as f:
for chunk in r.iter_content(chunk_size = 2 ** 20):
print(chunk[-16:].hex().upper())
f.write(chunk)
ctr.update()
Run Code Online (Sandbox Code Playgroud)
输出(+ ascii-video)
Progress Bar Usage页面上的示例与代码实际需要的内容之间似乎存在脱节。
在以下示例中,请注意使用maxval代替max_value。还要注意.start()初始化栏的使用。这已在一个问题中指出。
该n_chunk参数表示在循环遍历请求迭代器时一次流式传输多少个 1024 kb 块。
import requests
import time
import numpy as np
import progressbar
url = "http://wikipedia.com/"
def download_file(url, n_chunk=1):
r = requests.get(url, stream=True)
# Estimates the number of bar updates
block_size = 1024
file_size = int(r.headers.get('Content-Length', None))
num_bars = np.ceil(file_size / (n_chunk * block_size))
bar = progressbar.ProgressBar(maxval=num_bars).start()
with open('test.html', 'wb') as f:
for i, chunk in enumerate(r.iter_content(chunk_size=n_chunk * block_size)):
f.write(chunk)
bar.update(i+1)
# Add a little sleep so you can see the bar progress
time.sleep(0.05)
return
download_file(url)
Run Code Online (Sandbox Code Playgroud)
编辑:解决了关于代码清晰度的评论。
EDIT2:固定逻辑,所以 bar 在完成时报告 100%。感谢leovp的答案使用1024 KB的块大小。
| 归档时间: |
|
| 查看次数: |
21951 次 |
| 最近记录: |