我想使用urllib下载文件并在保存之前将文件解压缩到内存中.
这就是我现在所拥有的:
response = urllib2.urlopen(baseURL + filename)
compressedFile = StringIO.StringIO()
compressedFile.write(response.read())
decompressedFile = gzip.GzipFile(fileobj=compressedFile, mode='rb')
outfile = open(outFilePath, 'w')
outfile.write(decompressedFile.read())
Run Code Online (Sandbox Code Playgroud)
这最终会写出空文件.我怎样才能实现我追求的目标?
更新答案:
#! /usr/bin/env python2
import urllib2
import StringIO
import gzip
baseURL = "https://www.kernel.org/pub/linux/docs/man-pages/"
# check filename: it may change over time, due to new updates
filename = "man-pages-5.00.tar.gz"
outFilePath = filename[:-3]
response = urllib2.urlopen(baseURL + filename)
compressedFile = StringIO.StringIO(response.read())
decompressedFile = gzip.GzipFile(fileobj=compressedFile)
with open(outFilePath, 'w') as outfile:
outfile.write(decompressedFile.read())
Run Code Online (Sandbox Code Playgroud) 我已经阅读了这篇关于这个问题的SO帖子无济于事.
我正在尝试解压缩来自URL的.gz文件.
url_file_handle=StringIO( gz_data )
gzip_file_handle=gzip.open(url_file_handle,"r")
decompressed_data = gzip_file_handle.read()
gzip_file_handle.close()
Run Code Online (Sandbox Code Playgroud)
...但我得到TypeError:强制转换为Unicode:需要字符串或缓冲区,找到cStringIO.StringI
这是怎么回事?
Traceback (most recent call last):
File "/opt/google/google_appengine-1.2.5/google/appengine/tools/dev_appserver.py", line 2974, in _HandleRequest
base_env_dict=env_dict)
File "/opt/google/google_appengine-1.2.5/google/appengine/tools/dev_appserver.py", line 411, in Dispatch
base_env_dict=base_env_dict)
File "/opt/google/google_appengine-1.2.5/google/appengine/tools/dev_appserver.py", line 2243, in Dispatch
self._module_dict)
File "/opt/google/google_appengine-1.2.5/google/appengine/tools/dev_appserver.py", line 2161, in ExecuteCGI
reset_modules = exec_script(handler_path, cgi_path, hook)
File "/opt/google/google_appengine-1.2.5/google/appengine/tools/dev_appserver.py", line 2057, in ExecuteOrImportScript
exec module_code in script_module.__dict__
File "/home/jldupont/workspace/jldupont/trunk/site/app/server/tasks/debian/repo_fetcher.py", line 36, in <module>
main()
File "/home/jldupont/workspace/jldupont/trunk/site/app/server/tasks/debian/repo_fetcher.py", line 30, in main
gziph=gzip.open(fh,'r')
File "/usr/lib/python2.5/gzip.py", line 49, …Run Code Online (Sandbox Code Playgroud) 我正在使用Amazon S3来提供静态文件.当Content-Type只是'text/css'并且我没有压缩文件时,它会被返回ok.如果我尝试zlib.compress()将返回的内容并将Content-Encoding更改为'gzip',则浏览器无法解码结果.在Chrome中,错误是
Error 330 net::ERR_CONTENT_DECODING_FAILED
Run Code Online (Sandbox Code Playgroud)
在Safari中,
“cannot decode raw data” (NSURLErrorDomain:-1015)
Run Code Online (Sandbox Code Playgroud)
有没有什么特别的与python的zlib有关,以确保结果可以由浏览器返回和解压缩?
我有一个django表单来获取用户名,密码.当用户发布数据时,我看到帖子字典包含以下(traceback),Traceback(最近一次调用最后一次):
File "/usr/lib/python2.4/site-packages/django/core/handlers/base.py", line 111, in get_response
response = callback(request, *callback_args, **callback_kwargs)
File "/usr/lib/python2.4/site-packages/django/views/decorators/csrf.py", line 39, in wrapped_view
resp = view_func(*args, **kwargs)
File "/usr/lib/python2.4/site-packages/django/views/decorators/csrf.py", line 52, in wrapped_view
return view_func(*args, **kwargs)
File "/public/gdp/trunk/src/ukl/lis/process/utils/error_handler.py", line 17, in __call__
return self.function(*args, **kwargs)
File "/usr/lib/python2.4/site-packages/django/views/decorators/cache.py", line 66, in _cache_controlled
response = viewfunc(request, *args, **kw)
File "/public/gdp/trunk/src/ukl/lis/process/authentication/views.py", line 530, in process_login
form = loginForm(request.POST)
File "/usr/lib/python2.4/site-packages/django/core/handlers/modpython.py", line 101, in _get_post
self._load_post_and_files()
File "/usr/lib/python2.4/site-packages/django/http/__init__.py", line 270, in _load_post_and_files
if self.META.get('CONTENT_TYPE', '').startswith('multipart'):
AttributeError: 'NoneType' object has …Run Code Online (Sandbox Code Playgroud) 是否可以在Amazon S3存储桶中循环访问文件/密钥,使用Python读取内容并计算行数?
例如:
1. My bucket: "my-bucket-name"
2. File/Key : "test.txt"
Run Code Online (Sandbox Code Playgroud)
我需要遍历文件“ test.txt”并计算原始文件中的行数。
样例代码:
for bucket in conn.get_all_buckets():
if bucket.name == "my-bucket-name":
for file in bucket.list():
#need to count the number lines in each file and print to a log.
Run Code Online (Sandbox Code Playgroud) 我有一些数据需要使用 zlib 进行解码。经过一番谷歌搜索后,我认为 python 可以做到这一点。
我有点不知道如何实现这一点;谁能帮我走上这条路?
数据只是编码文本;我知道我需要导入zlib一个 python 文件,并使用它进行解码,但我不知道从哪里开始。
我是从这个开始的:
import zlib
f = "012301482103"
data = f
zlib.decompress((data))
print data
Run Code Online (Sandbox Code Playgroud) python ×6
amazon-s3 ×2
gzip ×2
zlib ×2
apache ×1
boto ×1
django ×1
django-views ×1
file ×1
http ×1
http-headers ×1
mod-python ×1
stringio ×1
urllib2 ×1