如何使用urllib2下载gzip文件,而不会破坏它们?

Dav*_*vid 3 python gzip urllib2

我正在编写一个脚本来下载gzipped XML站点地图; 文件下载,但它们已损坏.脚本输出的gzip文件比它们应该大一点,并且解压缩的文件比它们应该小,因为数据丢失了.知道我做错了什么吗?

saveAddress = "test.xml.gz"

import urllib2
import httplib
from urllib2 import Request, urlopen, URLError
try:
    request = urllib2.Request("http://example.com/sitemap-general.xml.gz")
    request.add_header('Accept-encoding', 'gzip')
    request.add_header('User-agent', 'Custom UA String')
    opener = urllib2.build_opener()
    try:
        pageText = opener.open(request).read()
        open(saveAddress, "w").write(pageText)
        print "Crawled successfully."
    except URLError, e:
        pass    
except URLError, e:
    pass
Run Code Online (Sandbox Code Playgroud)

感谢您的帮助,非常感谢.

小智 7

以二进制模式打开文件:

open(saveAddress, "wb").write(pageText)
Run Code Online (Sandbox Code Playgroud)