wat*_*wer 5 python csv python-3.x python-requests
我正在阅读McKinney的数据分析书,他已经分享了150MB的文件.尽管在使用请求通过http下载文件时,Progress Bar已经广泛讨论了这个主题,但我发现接受的答案中的代码引发了错误.我是初学者,所以我无法解决这个问题.
我想下载以下文件:
https://raw.githubusercontent.com/wesm/pydata-book/2nd-edition/datasets/fec/P00000001-ALL.csv
Run Code Online (Sandbox Code Playgroud)
这是没有进度条的代码:
DATA_PATH='./Data'
filename = "P00000001-ALL.csv"
url_without_filename = "https://raw.githubusercontent.com/wesm/pydata-book/2nd-edition/datasets/fec"
url_with_filename = url_without_filename + "/" + filename
local_filename = DATA_PATH + '/' + filename
#Write the file on local disk
r = requests.get(url_with_filename) #without streaming
with open(local_filename, 'w', encoding=r.encoding) as f:
f.write(r.text)
Run Code Online (Sandbox Code Playgroud)
这很好用,但因为没有进度条,我想知道发生了什么.
这里是从Progress Bar改编的代码,同时通过http下载文件和请求以及如何使用requests.py在python中下载大文件?
#Option 2:
#Write the file on local disk
r = requests.get(url_with_filename, stream=True) # added stream parameter
total_size = int(r.headers.get('content-length', 0))
with open(local_filename, 'w', encoding=r.encoding) as f:
#f.write(r.text)
for chunk in tqdm(r.iter_content(1024), total=total_size, unit='B', unit_scale=True):
if chunk:
f.write(chunk)
Run Code Online (Sandbox Code Playgroud)
第二个选项存在两个问题(即使用流和tqdm包):
a)文件大小未正确计算.实际大小为157MB,但total_size结果是25MB.
b)比a)更大的问题是我得到以下错误:
0%| | 0.00/24.6M [00:00<?, ?B/s] Traceback (most recent call last): File "C:\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 3265, in run_code
exec(code_obj, self.user_global_ns, self.user_ns) File "<ipython-input-31-abbe9270092b>", line 6, in <module>
f.write(data) TypeError: write() argument must be str, not bytes
Run Code Online (Sandbox Code Playgroud)
作为初学者,我不确定如何解决这两个问题.我花了很多时间浏览git页面tqdm,但我无法遵循它.我很感激任何帮助.
我假设读者知道我们需要导入requests和tqdm.所以,我没有包含导入这些基本包的代码.
以下是那些好奇的人的代码:
with open(local_filename, 'wb') as f:
r = requests.get(url_with_filename, stream=True) # added stream parameter
# total_size = int(r.headers.get('content-length', 0))
local_filename = DATA_PATH + '/' + filename
total_size = len(r.content)
downloaded = 0
# chunk_size = max(1024*1024,int(total_size/1000))
chunk_size = 1024
#for chunk in tqdm(r.iter_content(chunk_size=chunk_size),total=total_size,unit='KB',unit_scale=True):
for chunk in r.iter_content(chunk_size=chunk_size):
downloaded += len(chunk)
a=f.write(chunk)
done = int(50 * downloaded/ total_size)
sys.stdout.write("\r[%s%s]" % ('=' * done, ' ' * (50 - done)))
sys.stdout.flush()
Run Code Online (Sandbox Code Playgroud)
with open(filename, 'wb', encoding=r.encoding) as f:
f.write(r.content)
Run Code Online (Sandbox Code Playgroud)
这应该可以解决你的写作问题。Write r.contentnotSince是您需要在文件中写入r.text
的内容type(r.content)<class 'bytes'>
| 归档时间: |
|
| 查看次数: |
320 次 |
| 最近记录: |