我有两个zip文件,它们都可以通过Windows资源管理器和7-zip打开.
然而,当我用Python的zipfile模块[zipfile.ZipFile("filex.zip")]打开它们时,其中一个打开但另一个给出错误" BadZipfile: File is not a zip file
".
我确保后者是一个有效的Zip文件,用7-Zip打开并查看其属性(7Zip.ZIP说).当我用文本编辑器打开文件时,前两个字符是"PK",表明它确实是一个zip文件.
我正在使用Python 2.5,并且真的没有任何线索如何解决这个问题.我已经尝试过Windows和Ubuntu,并且两个平台都存在问题.
更新: Windows上的Python 2.5.4的回溯:
Traceback (most recent call last):
File "<module1>", line 5, in <module>
zipfile.ZipFile("c:/temp/test.zip")
File "C:\Python25\lib\zipfile.py", line 346, in init
self._GetContents()
File "C:\Python25\lib\zipfile.py", line 366, in _GetContents
self._RealGetContents()
File "C:\Python25\lib\zipfile.py", line 378, in _RealGetContents
raise BadZipfile, "File is not a zip file"
BadZipfile: File is not a zip file
Run Code Online (Sandbox Code Playgroud)
基本上,当_EndRecData
调用函数从中央目录结束"记录中获取数据时,注释长度检出失败[endrec [7] == len(comment)].
函数中locals的值_EndRecData
如下:
END_BLOCK: 4096,
comment: '\x00',
data: '\xd6\xf6\x03\x00\x88,N8?<e\xf0q\xa8\x1cwK\x87\x0c(\x82a\xee\xc61N\'1qN\x0b\x16K-\x9d\xd57w\x0f\xa31n\xf3dN\x9e\xb1s\xffu\xd1\.....', (truncated)
endrec: ['PK\x05\x06', 0, 0, 4, 4, 268, 199515, 0],
filesize: 199806L,
fpin: <open file 'c:/temp/test.zip', mode 'rb' at 0x045D4F98>,
start: 4073
Run Code Online (Sandbox Code Playgroud)
小智 12
名为file的文件可能会混淆python - 尝试将其命名为其他内容.如果它仍然无法工作,请尝试以下代码:
def fixBadZipfile(zipFile):
f = open(zipFile, 'r+b')
data = f.read()
pos = data.find('\x50\x4b\x05\x06') # End of central directory signature
if (pos > 0):
self._log("Trancating file at location " + str(pos + 22)+ ".")
f.seek(pos + 22) # size of 'ZIP end of central directory record'
f.truncate()
f.close()
else:
# raise error, file is truncated
Run Code Online (Sandbox Code Playgroud)
小智 9
astronautlevel的解决方案适用于大多数情况,但Zip中的压缩数据和CRC也可以包含相同的4个字节.你应该做一个rfind
(不是find
),寻找pos + 20,然后将write添加\x00\x00
到文件的末尾(告诉zip应用程序'comments'部分的长度是0字节长).
# HACK: See http://bugs.python.org/issue10694
# The zip file generated is correct, but because of extra data after the 'central directory' section,
# Some version of python (and some zip applications) can't read the file. By removing the extra data,
# we ensure that all applications can read the zip without issue.
# The ZIP format: http://www.pkware.com/documents/APPNOTE/APPNOTE-6.3.0.TXT
# Finding the end of the central directory:
# http://stackoverflow.com/questions/8593904/how-to-find-the-position-of-central-directory-in-a-zip-file
# http://stackoverflow.com/questions/20276105/why-cant-python-execute-a-zip-archive-passed-via-stdin
# This second link is only losely related, but echos the first, "processing a ZIP archive often requires backwards seeking"
content = zipFileContainer.read()
pos = content.rfind('\x50\x4b\x05\x06') # reverse find: this string of bytes is the end of the zip's central directory.
if pos>0:
zipFileContainer.seek(pos+20) # +20: see secion V.I in 'ZIP format' link above.
zipFileContainer.truncate()
zipFileContainer.write('\x00\x00') # Zip file comment length: 0 byte length; tell zip applications to stop reading.
zipFileContainer.seek(0)
return zipFileContainer
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
58943 次 |
最近记录: |