Check workbook for sheet and add if missing

Max*_*axB 2 python pandas openpyxl

I am trying to simply check if a sheet exists in an .xlsx file and if not I want to add it.

book = load_workbook('test.xlsx')
writer = pd.ExcelWriter('test.xlsx', engine = 'openpyxl')
writer.book = book

if 'testSheet' in book.sheetnames:
    pass
else:
    book.add_sheet(book['testSheet'])
Run Code Online (Sandbox Code Playgroud)

Any ideas as to why this doesn't work?

ama*_*anb 8

如果您仅使用扩展名为的Excel文件*.xlsx,则openpyxl具有有用的功能,可让您在Excel工作表中创建,访问,重命名,添加数据或从中删除数据。尽管使用openpyxl访问工作簿的工作表似乎很简单,但是当工作表最初不存在时,利用Python的异常处理可以帮助捕获错误。考虑下面的示例,如果工作簿“ test.xlsx”不存在名为“ invalidSheet”的工作表,则会引发KeyError。如果工作表不存在,则try / except块的工作是引发异常。这个简单示例的目的是仅识别openpyxl引发的异常类型

In [1]: import openpyxl

In [2]: book = openpyxl.load_workbook("test.xlsx")

In [3]: try:
   ...:     ws = book["invalidSheet"]  #try to access a non-existent worksheet
   ...: except:
   ...:     raise
   ...:
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-3-4f4ac71a4f19> in <module>
      1 try:
----> 2     ws = book["invalidSheet"]
      3 except:
      4     raise
      5

~\Anaconda3\lib\site-packages\openpyxl\workbook\workbook.py in __getitem__(self, key)
    275             if sheet.title == key:
    276                 return sheet
--> 277         raise KeyError("Worksheet {0} does not exist.".format(key))
    278
    279     def __delitem__(self, key):

KeyError: 'Worksheet invalidSheet does not exist.'
Run Code Online (Sandbox Code Playgroud)

这有助于我们形成一个更明确的try / except块,以捕获不存在的工作表。稍后,我们将改进此示例,但首先让我们在此Excel电子表格中查找工作表名称。我们使用之前创建sheetnames的Workbook对象的属性book

In [15]: book.sheetnames
Out[15]: ['testSheet1', 'testSheet2']

In [16]: type(book.sheetnames)
Out[16]: list
Run Code Online (Sandbox Code Playgroud)

这将返回一个工作表名称列表。我们稍后将使用此信息来验证工作表名称。回到上面的示例,以下改进的版本捕获KeyError不存在的图纸的,并创建一个新的图纸(如果不存在)。但是,除非我们将save()这些表格显示在实际的Excel电子表格中。另一方面,该工作表的名称仍将更新。您可以在执行代码段后对此进行验证:

In [20]: try:
    ...:     filename = "test.xlsx"
    ...:     sheet_name = "invalidSheet"
    ...:     ws = book[sheet_name]
    ...: except KeyError:
    ...:     print("The worksheet '{}' does not exist for workbook '{}'. Creating one...".format(
    ...:                                                                                         sheet_name,
    ...:                                                                                         filename))
    ...:     book.create_sheet(sheet_name)
    ...:     print("Worksheet '{}' created successfully for workbook '{}'.".format(sheet_name, filename))
    ...:
The worksheet 'invalidSheet' does not exist for workbook 'test.xlsx'. Creating one...
Worksheet 'invalidSheet' created successfully for workbook 'test.xlsx'.

In [21]: book.sheetnames
Out[21]: ['testSheet1', 'testSheet2', 'invalidSheet']
Run Code Online (Sandbox Code Playgroud)

现在,我们已经添加了表格“ invalidSheet”,让我们添加一些数据并使用更有意义的名称进行保存。Openpyxl还提供了Pandas数据帧支持。我们将首先创建一个数据框,然后使用dataframe_to_rows()方法将数据框中的每一行(包括标题)附加到工作表中,然后重命名工作表并最后保存它。

In [23]: import pandas as pd

In [24]: df = pd.DataFrame({"Name": ["John", "Val", "Katie"], 
                           "Favorite Pet":["dog", "cat", "guinea pig"]})   #create dataframe

In [25]: df
Out[25]:
    Name Favorite Pet
0   John          dog
1    Val          cat
2  Katie   guinea pig

In [26]: from openpyxl.utils.dataframe import dataframe_to_rows #import method

In [27]: ws = book["invalidSheet"] #create a worksheet object for the existing sheet "invalidSheet"

In [29]: for r in dataframe_to_rows(df, index=False, header=True):
    ...:     ws.append(r)    #append each df row to the worksheet
    ...:                                    
In [31]: ws['A2'].value    #verify value at cell 'A2'. Remember, the first row will be the header
Out[31]: 'John'

In [32]: ws.title = "favPetSheet" #rename the worksheet

In [33]: book.sheetnames  #verify whether the sheet was added & renamed
Out[33]: ['testSheet1', 'testSheet2', 'favPetSheet']

In [35]: book.save("test.xlsx")  #save the workbook
Run Code Online (Sandbox Code Playgroud)

在理想情况下,单个功能应针对某个工作簿以及该工作簿和数据框的工作表执行所有这些任务。

In [45]: def check_sheet_add_data(filename, sheetname, df):
    ...:     """Check if sheet exists for an xlsx spreadsheet and add data from dataframe to the sheet
    ...:        :param: filename - The filename of the xlsx spreadsheet
    ...:        :param: sheetname - Name of the worksheet to search for
    ...:        :param: df - A Pandas dataframe object"""
    ...:
    ...:     wb = openpyxl.load_workbook(filename)
    ...:     try:
    ...:         ws = wb[sheetname]
    ...:         print("Sheet '{}' found in workbook '{}'".format(sheetname, filename))
    ...:     except KeyError:
    ...:         print("Worksheet '{}' not found for workbook '{}'.Adding...".format(sheetname, filename))
    ...:         wb.create_sheet(sheetname)
    ...:         ws = wb[sheetname]
    ...:         print()
    ...:         print("Current sheetnames: {}".format(wb.sheetnames))
    ...:         print()
    ...:         print("Worksheet '{}' added successfully for workbook '{}'".format(sheetname, filename))
    ...:     finally:
    ...:         print()
    ...:         print("Adding data to worksheet '{}'...".format(sheetname))
    ...:         print()
    ...:         for r in dataframe_to_rows(df, index=False, header=True):
    ...:             ws.append(r)
    ...:         wb.save(filename)
    ...:         print("Workbook '{}' saved successfully.".format(filename))
    ...:         print()
    ...:         print("***End***")
Run Code Online (Sandbox Code Playgroud)

准备好此功能后,让我们测试所有条件。首先,让我们添加一些新数据,为我们的老朋友John,Val和Katie说“收藏夹”。

In [39]: df2 = pd.DataFrame({"Name":["John", "Val", "Katie"], 
                         "Favorite Album": ["Thriller", "Stairway to Heaven", "Abbey Road"]})

In [40]: df2
Out[40]:
    Name      Favorite Album
0   John            Thriller
1    Val  Stairway to Heaven
2  Katie          Abbey Road
Run Code Online (Sandbox Code Playgroud)

我们的工作簿将是相同的“ test.xlsx”,新的工作表将被称为“ favAlbumSheet”。在现有和不存在的工作表的所有条件下进行测试:

#Condition 1: Worksheet does not exist
In [44]: check_sheet_add_data(filename="test.xlsx", sheetname="favAlbumSheet", df=df2)
Worksheet 'favAlbumSheet' not found for workbook 'test.xlsx'.Adding...

Current sheetnames: ['testSheet1', 'testSheet2', 'favPetSheet', 'favAlbumSheet']

Worksheet 'favAlbumSheet' added successfully for workbook 'test.xlsx'

Adding data to worksheet 'favAlbumSheet'...

Workbook 'test.xlsx' saved successfully.

***End***

#Condition 2: Worksheet exists
In [46]: check_sheet_add_data(filename="test.xlsx", sheetname="favAlbumSheet", df=df2)
Sheet 'favAlbumSheet' found in workbook 'test.xlsx'

Adding data to worksheet 'favAlbumSheet'...

Workbook 'test.xlsx' saved successfully.

***End***
Run Code Online (Sandbox Code Playgroud)

我们利用Openpyxl的易于使用的功能访问有效的Excel工作簿中的工作表,并将数据框中的数据添加到工作表中。使用Python的异常处理,我们能够清楚地识别出一个工作表(对于有效的工作簿)的存在,并在必要时添加一个工作表。可以进一步扩展该函数以捕获其他错误,例如无效的filename(FileNotFoundError),无效的dataframe对象等。如果您不想每次都添加数据,而仅检查工作表的存在,请创建df一个可选参数:df=None并且仅保存该工作簿,而不在该finally块中将任何数据附加到工作表。


Xuk*_*rao 6

可以仅使用openpyxl命令来添加工作表(即,也无需涉及pandas命令):

import openpyxl

# Load existing excel file into a openpyxl Workbook object
book = openpyxl.load_workbook('test.xlsx')

# If sheet 'testSheet' does not exist yet, then add it in the openpyxl Workbook object
if not 'testSheet' in book.sheetnames:
    book.create_sheet('testSheet')

# Save the openpyxl Workbook object to file
book.save('test.xlsx')
Run Code Online (Sandbox Code Playgroud)