如何检查上传文件是CSV还是XLS.如何在python中检查它.我正在将文件导入openerp中的二进制字段,该字段可以作为二进制对象进行检索.我需要读取文件并将数据导入表中.用户可以上传csv或xls文件.只知道我可以使用csv包或xlrd包.
文件的十六进制签名.xls如下:
Excel电子表格子标题(MS Office)
09 08 10 00 00 06 05 00 [512 byte offset]
您可以阅读维基百科上的其他各种签名.
我相信你可以做这样的事情.这是未经测试的,但你可以摆弄它直到它工作.如有任何建议或更改,请留下评论.谢谢!
xls_sig = b'\x09\x08\x10\x00\x00\x06\x05\x00'
offset = 512
size = 8
with open('spreadsheet.xls', 'rb') as f:
f.seek(offset) # Seek to the offset.
bytes = f.read(size) # Capture the specified number of bytes.
if bytes == xls_sig:
print 'Uploaded file is an xls.'
else:
print 'File is not an xls.'
Run Code Online (Sandbox Code Playgroud)
测试了这一点,我可以验证它是否适用于检测.xls文件.
我开发了一个程序来确定文件是xls还是xlsx:
import codecs
xlsx_sig = b'\x50\x4B\x05\06'
xls_sig = b'\x09\x08\x10\x00\x00\x06\x05\x00'
filenames = [
('spreadsheet.xls', 0, 512, 8),
('spreadsheet.xlsx', 2, -22, 4)]
for filename, whence, offset, size in filenames:
with open(filename, 'rb') as f:
f.seek(offset, whence) # Seek to the offset.
bytes = f.read(size) # Capture the specified number of bytes.
print codecs.getencoder('hex')(bytes)
if bytes == xls_sig:
msg = '"{}" is an xls.'
elif bytes == xlsx_sig:
msg = '"{}" is an xlsx.'
else:
msg = '"{}" is not an Excel document.'
print msg.format(filename)
Run Code Online (Sandbox Code Playgroud)
('0908100000060500', 8)
"spreadsheet.xls" is an xls.
('504b0506', 4)
"spreadsheet.xlsx" is an xlsx.
Run Code Online (Sandbox Code Playgroud)
只是 PolyWhirl 帖子的扩展,其中包含我遇到的一些边缘情况。
def isExcelDoc(file):
excelSigs = [
('xlsx', b'\x50\x4B\x05\x06', 2, -22, 4),
('xls', b'\x09\x08\x10\x00\x00\x06\x05\x00', 0, 512, 8), #Saved from Excel
('xls', b'\x09\x08\x10\x00\x00\x06\x05\x00', 0, 1536, 8), #Saved from LibreOffice Calc
('xls', b'\x09\x08\x10\x00\x00\x06\x05\x00', 0, 2048, 8) #Saved from Excel then saved from Calc
]
for sigType, sig, whence, offset, size in excelSigs:
with open(file, 'rb') as f:
f.seek(offset, whence)
bytes = f.read(size)
if bytes == sig:
return True
return False
Run Code Online (Sandbox Code Playgroud)