将 pandas 数据帧导出到 xlsx:处理 python 3.9 上的 openpyxl 问题

Bag*_*gio 2 xlsx pandas openpyxl python-3.9

使用最新的软件包版本:openpyxl:3.0.6 | 熊猫:1.2.3 |蟒蛇:3.9

在将上面的软件包更新到报告的最新版本之前,下面的功能运行良好。

现在它会引发错误:“zipfile.BadZipFile:文件不是 zip 文件”。

这样的功能确实很有用,并且很高兴知道它是否可以修复以使其正常工作。

下面的函数可以直接运行,只需将“pathExport”替换为您的导出目录进行测试即可。

    def append_df_to_excel(filename, df, sheet_name='Sheet1', startrow=None,
                           truncate_sheet=False, 
                           **to_excel_kwargs):
        """
        Append a DataFrame [df] to existing Excel file [filename]
        into [sheet_name] Sheet.
        If [filename] doesn't exist, then this function will create it.
    
        Parameters:
          filename : File path or existing ExcelWriter
                     (Example: '/path/to/file.xlsx')
          df : dataframe to save to workbook
          sheet_name : Name of sheet which will contain DataFrame.
                       (default: 'Sheet1')
          startrow : upper left cell row to dump data frame.
                     Per default (startrow=None) calculate the last row
                     in the existing DF and write to the next row...
          truncate_sheet : truncate (remove and recreate) [sheet_name]
                           before writing DataFrame to Excel file
          to_excel_kwargs : arguments which will be passed to `DataFrame.to_excel()`
                            [can be dictionary]
    
        Returns: None
    
        (c) [MaxU](/sf/users/401884381/?tab=profile)
        """
        from openpyxl import load_workbook
    
        # ignore [engine] parameter if it was passed
        if 'engine' in to_excel_kwargs:
            to_excel_kwargs.pop('engine')
    
        writer = pd.ExcelWriter(filename, engine='openpyxl')
    
        # Python 2.x: define [FileNotFoundError] exception if it doesn't exist 
        try:
            FileNotFoundError
        except NameError:
            FileNotFoundError = IOError
    
    
        try:
            # try to open an existing workbook
            writer.book = load_workbook(filename)
            
            # get the last row in the existing Excel sheet
            # if it was not specified explicitly
            if startrow is None and sheet_name in writer.book.sheetnames:
                startrow = writer.book[sheet_name].max_row
    
            # truncate sheet
            if truncate_sheet and sheet_name in writer.book.sheetnames:
                # index of [sheet_name] sheet
                idx = writer.book.sheetnames.index(sheet_name)
                # remove [sheet_name]
                writer.book.remove(writer.book.worksheets[idx])
                # create an empty sheet [sheet_name] using old index
                writer.book.create_sheet(sheet_name, idx)
            
            # copy existing sheets
            writer.sheets = {ws.title:ws for ws in writer.book.worksheets}
        except FileNotFoundError:
            # file does not exist yet, we will create it
            pass
    
        if startrow is None:
            startrow = 0
    
        # write out the new sheet
        df.to_excel(writer, sheet_name, startrow=startrow, **to_excel_kwargs)
    
        # save the workbook
        writer.save()

pathExport = r"F:\PYTHON\NB-Suite_python39\MNE\outputData\df.xlsx"
df1 = pd.DataFrame({'numbers': [1, 2, 3],
                    'colors': ['red', 'white', 'blue'],
                    'colorsTwo': ['yellow', 'white', 'blue']
                    })

append_df_to_excel(pathExport, df1, sheet_name="DF1", index=False, startcol=0, startrow=0)
Run Code Online (Sandbox Code Playgroud)

bur*_*ran 5

好的,我能够复制这个问题。这是pandas有关系的。在 pandas 1.1.5 之前一切正常 在 pandas 1.2.0 他们做了一些改变

当你实例化pd.ExcelWriter

writer = pd.ExcelWriter(filename, engine='openpyxl')`
Run Code Online (Sandbox Code Playgroud)

它会创建大小为 0 字节的空文件并覆盖现有文件,然后在尝试加载它时会出现错误。它不openpyxl相关,因为最新版本可以openpyxl与 pandas 1.1.5 配合使用。

解决方案 - 指定mode='a',将上面的行更改为

writer = pd.ExcelWriter(filename, engine='openpyxl', mode='a')
Run Code Online (Sandbox Code Playgroud)

或者 - 查看这个这个解决方案,它在实例化pd.ExcelWriter.

编辑:我在评论中被告知,mode='a'如果文件不存在,它将引发 FileNotFoundError 。尽管在这种情况下它不会创建文件是出乎意料的,但解决方案是将创建移动到writer现有块内部并在该部分中try创建一个writerwith 模式:wexcept

def append_df_to_excel(filename, df, sheet_name='Sheet1', startrow=None,
                           truncate_sheet=False, 
                           **to_excel_kwargs):
        """
        Append a DataFrame [df] to existing Excel file [filename]
        into [sheet_name] Sheet.
        If [filename] doesn't exist, then this function will create it.
     
        Parameters:
          filename : File path or existing ExcelWriter
                     (Example: '/path/to/file.xlsx')
          df : dataframe to save to workbook
          sheet_name : Name of sheet which will contain DataFrame.
                       (default: 'Sheet1')
          startrow : upper left cell row to dump data frame.
                     Per default (startrow=None) calculate the last row
                     in the existing DF and write to the next row...
          truncate_sheet : truncate (remove and recreate) [sheet_name]
                           before writing DataFrame to Excel file
          to_excel_kwargs : arguments which will be passed to `DataFrame.to_excel()`
                            [can be dictionary]
     
        Returns: None
     
        (c) [MaxU](/sf/users/401884381/?tab=profile)
        """
        from openpyxl import load_workbook
     
        # ignore [engine] parameter if it was passed
        if 'engine' in to_excel_kwargs:
            to_excel_kwargs.pop('engine')
     
        
     
        # Python 2.x: define [FileNotFoundError] exception if it doesn't exist 
        try:
            FileNotFoundError
        except NameError:
            FileNotFoundError = IOError
     
     
        try:
            writer = pd.ExcelWriter(filename, engine='openpyxl', mode='a')
            # try to open an existing workbook
            writer.book = load_workbook(filename)
             
            # get the last row in the existing Excel sheet
            # if it was not specified explicitly
            if startrow is None and sheet_name in writer.book.sheetnames:
                startrow = writer.book[sheet_name].max_row
     
            # truncate sheet
            if truncate_sheet and sheet_name in writer.book.sheetnames:
                # index of [sheet_name] sheet
                idx = writer.book.sheetnames.index(sheet_name)
                # remove [sheet_name]
                writer.book.remove(writer.book.worksheets[idx])
                # create an empty sheet [sheet_name] using old index
                writer.book.create_sheet(sheet_name, idx)
             
            # copy existing sheets
            writer.sheets = {ws.title:ws for ws in writer.book.worksheets}
        except FileNotFoundError:
            # file does not exist yet, we will create it
            writer = pd.ExcelWriter(filename, engine='openpyxl')
     
        if startrow is None:
            startrow = 0
     
        # write out the new sheet
        df.to_excel(writer, sheet_name, startrow=startrow, **to_excel_kwargs)
     
        # save the workbook
        writer.save()
Run Code Online (Sandbox Code Playgroud)