Ray*_*Ray 17
我以前必须这样做.主要思想是使用xlrd模块打开和解析xls文件,并使用openpyxl模块将内容写入xlsx文件.
这是我的代码.注意!它无法处理复杂的xls文件,如果要使用它,应该添加自己的解析逻辑.
import xlrd
from openpyxl.workbook import Workbook
from openpyxl.reader.excel import load_workbook, InvalidFileException
def open_xls_as_xlsx(filename):
# first open using xlrd
book = xlrd.open_workbook(filename)
index = 0
nrows, ncols = 0, 0
while nrows * ncols == 0:
sheet = book.sheet_by_index(index)
nrows = sheet.nrows
ncols = sheet.ncols
index += 1
# prepare a xlsx sheet
book1 = Workbook()
sheet1 = book1.get_active_sheet()
for row in xrange(0, nrows):
for col in xrange(0, ncols):
sheet1.cell(row=row, column=col).value = sheet.cell_value(row, col)
return book1
Run Code Online (Sandbox Code Playgroud)
小智 13
您需要在您的计算机上安装win32com.这是我的代码:
import win32com.client as win32
fname = "full+path+to+xls_file"
excel = win32.gencache.EnsureDispatch('Excel.Application')
wb = excel.Workbooks.Open(fname)
wb.SaveAs(fname+"x", FileFormat = 51) #FileFormat = 51 is for .xlsx extension
wb.Close() #FileFormat = 56 is for .xls extension
excel.Application.Quit()
Run Code Online (Sandbox Code Playgroud)
chf*_*hfw 12
这是我的解决方案,不考虑字体,图表和图像:
$ pip install pyexcel pyexcel-xls pyexcel-xlsx
Run Code Online (Sandbox Code Playgroud)
然后这样做::
import pyexcel as p
p.save_book_as(file_name='your-file-in.xls',
dest_file_name='your-new-file-out.xlsx')
Run Code Online (Sandbox Code Playgroud)
如果您不需要程序,可以安装一个附加软件包pyexcel-cli ::
$ pip install pyexcel-cli
$ pyexcel transcode your-file-in.xls your-new-file-out.xlsx
Run Code Online (Sandbox Code Playgroud)
上面的转码过程使用xlrd和openpyxl.
小智 9
我发现这里没有任何答案100%正确.所以我在这里发布我的代码:
import xlrd
from openpyxl.workbook import Workbook
def cvt_xls_to_xlsx(src_file_path, dst_file_path):
book_xls = xlrd.open_workbook(src_file_path)
book_xlsx = Workbook()
sheet_names = book_xls.sheet_names()
for sheet_index in range(0,len(sheet_names)):
sheet_xls = book_xls.sheet_by_name(sheet_names[sheet_index])
if sheet_index == 0:
sheet_xlsx = book_xlsx.active()
sheet_xlsx.title = sheet_names[sheet_index]
else:
sheet_xlsx = book_xlsx.create_sheet(title=sheet_names[sheet_index])
for row in range(0, sheet_xls.nrows):
for col in range(0, sheet_xls.ncols):
sheet_xlsx.cell(row = row+1 , column = col+1).value = sheet_xls.cell_value(row, col)
book_xlsx.save(dst_file_path)
Run Code Online (Sandbox Code Playgroud)
您可以使用 pandas IO 函数:
import pandas as pd
df = pd.read_excel('file_2003.xls', header=None)
df.to_excel('file_2003.xlsx', index=False, header=False)
Run Code Online (Sandbox Code Playgroud)
小智 6
Ray的答案对我帮助很大,但对于那些搜索一个简单的方法将所有表格从xls转换为xlsx的人,我做了这个要点:
import xlrd
from openpyxl.workbook import Workbook as openpyxlWorkbook
# content is a string containing the file. For example the result of an http.request(url).
# You can also use a filepath by calling "xlrd.open_workbook(filepath)".
xlsBook = xlrd.open_workbook(file_contents=content)
workbook = openpyxlWorkbook()
for i in xrange(0, xlsBook.nsheets):
xlsSheet = xlsBook.sheet_by_index(i)
sheet = workbook.active if i == 0 else workbook.create_sheet()
sheet.title = xlsSheet.name
for row in xrange(0, xlsSheet.nrows):
for col in xrange(0, xlsSheet.ncols):
sheet.cell(row=row, column=col).value = xlsSheet.cell_value(row, col)
# The new xlsx file is in "workbook", without iterators (iter_rows).
# For iteration, use "for row in worksheet.rows:".
# For range iteration, use "for row in worksheet.range("{}:{}".format(startCell, endCell)):".
Run Code Online (Sandbox Code Playgroud)
你可以找到xlrd LIB 这里和openpyxl 这里(您必须在您的项目下载xlrd谷歌应用程序引擎的例子).
小智 5
我正在提高 @Jackypengyu方法的性能。
ragged_rows=True(http://xlrd.readthedocs.io/en/latest/api.html#xlrd.sheet.Sheet.row_slice)合并的单元格也将被转换。
以相同顺序转换相同的12个文件:
原件:
0:00:01.958159
0:00:02.115891
0:00:02.018643
0:00:02.057803
0:00:01.267079
0:00:01.308073
0:00:01.245989
0:00:01.289295
0:00:01.273805
0:00:01.276003
0:00:01.293834
0:00:01.261401
Run Code Online (Sandbox Code Playgroud)
改进的:
0:00:00.774101
0:00:00.734749
0:00:00.741434
0:00:00.744491
0:00:00.320796
0:00:00.279045
0:00:00.315829
0:00:00.280769
0:00:00.316380
0:00:00.289196
0:00:00.347819
0:00:00.284242
Run Code Online (Sandbox Code Playgroud)
def cvt_xls_to_xlsx(*args, **kw):
"""Open and convert XLS file to openpyxl.workbook.Workbook object
@param args: args for xlrd.open_workbook
@param kw: kwargs for xlrd.open_workbook
@return: openpyxl.workbook.Workbook
You need -> from openpyxl.utils.cell import get_column_letter
"""
book_xls = xlrd.open_workbook(*args, formatting_info=True, ragged_rows=True, **kw)
book_xlsx = Workbook()
sheet_names = book_xls.sheet_names()
for sheet_index in range(len(sheet_names)):
sheet_xls = book_xls.sheet_by_name(sheet_names[sheet_index])
if sheet_index == 0:
sheet_xlsx = book_xlsx.active
sheet_xlsx.title = sheet_names[sheet_index]
else:
sheet_xlsx = book_xlsx.create_sheet(title=sheet_names[sheet_index])
for crange in sheet_xls.merged_cells:
rlo, rhi, clo, chi = crange
sheet_xlsx.merge_cells(
start_row=rlo + 1, end_row=rhi,
start_column=clo + 1, end_column=chi,
)
def _get_xlrd_cell_value(cell):
value = cell.value
if cell.ctype == xlrd.XL_CELL_DATE:
value = datetime.datetime(*xlrd.xldate_as_tuple(value, 0))
return value
for row in range(sheet_xls.nrows):
sheet_xlsx.append((
_get_xlrd_cell_value(cell)
for cell in sheet_xls.row_slice(row, end_colx=sheet_xls.row_len(row))
))
for rowx in range(sheet_xls.nrows):
if sheet_xls.rowinfo_map[rowx].hidden != 0:
print sheet_names[sheet_index], rowx
sheet_xlsx.row_dimensions[rowx+1].hidden = True
for coly in range(sheet_xls.ncols):
if sheet_xls.colinfo_map[coly].hidden != 0:
print sheet_names[sheet_index], coly
coly_letter = get_column_letter(coly+1)
sheet_xlsx.column_dimensions[coly_letter].hidden = True
return book_xlsx
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
45863 次 |
| 最近记录: |