使用Python中的openpyxl将行插入Excel电子表格

Nic*_*ick 13 python excel xlrd xlwt openpyxl

我正在寻找使用openpyxl将行插入电子表格的最佳方法.

实际上,我有一个电子表格(Excel 2007),它有一个标题行,后面是(最多)几千行数据.我想将行插入第一行实际数据,所以在标题之后.我的理解是append函数适合于将内容添加到文件的末尾.

阅读openpyxl和xlrd(以及xlwt)的文档,除了手动循环内容并插入新工作表(插入所需的行之后)之外,我找不到任何明确的方法.

鉴于我迄今为止使用Python的经验有限,我试图理解这是否确实是最好的选择(最pythonic!),如果是这样,有人可以提供一个明确的例子.具体来说,我可以使用openpyxl读取和写入行,还是必须访问单元格?另外我可以(过)写同一个文件(名字)吗?

Dal*_*las 17

==根据此处的反馈更新为功能齐全的版本:groups.google.com/forum/#!topic/openpyxl-users/wHGecdQg3Iw.==

正如其他人所指出的,openpyxl不提供此功能,但我已Worksheet按如下方式扩展了类以实现插入行.希望这证明对他人有用.

def insert_rows(self, row_idx, cnt, above=False, copy_style=True, fill_formulae=True):
    """Inserts new (empty) rows into worksheet at specified row index.

    :param row_idx: Row index specifying where to insert new rows.
    :param cnt: Number of rows to insert.
    :param above: Set True to insert rows above specified row index.
    :param copy_style: Set True if new rows should copy style of immediately above row.
    :param fill_formulae: Set True if new rows should take on formula from immediately above row, filled with references new to rows.

    Usage:

    * insert_rows(2, 10, above=True, copy_style=False)

    """
    CELL_RE  = re.compile("(?P<col>\$?[A-Z]+)(?P<row>\$?\d+)")

    row_idx = row_idx - 1 if above else row_idx

    def replace(m):
        row = m.group('row')
        prefix = "$" if row.find("$") != -1 else ""
        row = int(row.replace("$",""))
        row += cnt if row > row_idx else 0
        return m.group('col') + prefix + str(row)

    # First, we shift all cells down cnt rows...
    old_cells = set()
    old_fas   = set()
    new_cells = dict()
    new_fas   = dict()
    for c in self._cells.values():

        old_coor = c.coordinate

        # Shift all references to anything below row_idx
        if c.data_type == Cell.TYPE_FORMULA:
            c.value = CELL_RE.sub(
                replace,
                c.value
            )
            # Here, we need to properly update the formula references to reflect new row indices
            if old_coor in self.formula_attributes and 'ref' in self.formula_attributes[old_coor]:
                self.formula_attributes[old_coor]['ref'] = CELL_RE.sub(
                    replace,
                    self.formula_attributes[old_coor]['ref']
                )

        # Do the magic to set up our actual shift    
        if c.row > row_idx:
            old_coor = c.coordinate
            old_cells.add((c.row,c.col_idx))
            c.row += cnt
            new_cells[(c.row,c.col_idx)] = c
            if old_coor in self.formula_attributes:
                old_fas.add(old_coor)
                fa = self.formula_attributes[old_coor].copy()
                new_fas[c.coordinate] = fa

    for coor in old_cells:
        del self._cells[coor]
    self._cells.update(new_cells)

    for fa in old_fas:
        del self.formula_attributes[fa]
    self.formula_attributes.update(new_fas)

    # Next, we need to shift all the Row Dimensions below our new rows down by cnt...
    for row in range(len(self.row_dimensions)-1+cnt,row_idx+cnt,-1):
        new_rd = copy.copy(self.row_dimensions[row-cnt])
        new_rd.index = row
        self.row_dimensions[row] = new_rd
        del self.row_dimensions[row-cnt]

    # Now, create our new rows, with all the pretty cells
    row_idx += 1
    for row in range(row_idx,row_idx+cnt):
        # Create a Row Dimension for our new row
        new_rd = copy.copy(self.row_dimensions[row-1])
        new_rd.index = row
        self.row_dimensions[row] = new_rd
        for col in range(1,self.max_column):
            col = get_column_letter(col)
            cell = self.cell('%s%d'%(col,row))
            cell.value = None
            source = self.cell('%s%d'%(col,row-1))
            if copy_style:
                cell.number_format = source.number_format
                cell.font      = source.font.copy()
                cell.alignment = source.alignment.copy()
                cell.border    = source.border.copy()
                cell.fill      = source.fill.copy()
            if fill_formulae and source.data_type == Cell.TYPE_FORMULA:
                s_coor = source.coordinate
                if s_coor in self.formula_attributes and 'ref' not in self.formula_attributes[s_coor]:
                    fa = self.formula_attributes[s_coor].copy()
                    self.formula_attributes[cell.coordinate] = fa
                # print("Copying formula from cell %s%d to %s%d"%(col,row-1,col,row))
                cell.value = re.sub(
                    "(\$?[A-Z]{1,3}\$?)%d"%(row - 1),
                    lambda m: m.group(1) + str(row),
                    source.value
                )   
                cell.data_type = Cell.TYPE_FORMULA

    # Check for Merged Cell Ranges that need to be expanded to contain new cells
    for cr_idx, cr in enumerate(self.merged_cell_ranges):
        self.merged_cell_ranges[cr_idx] = CELL_RE.sub(
            replace,
            cr
        )

Worksheet.insert_rows = insert_rows
Run Code Online (Sandbox Code Playgroud)


Nic*_*ick 10

用我现在用来实现所需结果的代码回答这个问题.请注意,我手动在第1位插入行,但这应该很容易根据特定需要进行调整.您也可以轻松地调整它以插入多行,并简单地从相关位置填充其余数据.

另请注意,由于下游依赖性,我们手动指定"Sheet1"中的数据,并且数据将被复制到插入工作簿开头的新工作表,同时将原始工作表重命名为"Sheet1.5" .

编辑:我还添加了(稍后)对format_code的更改来修复这里默认复制操作删除所有格式的问题:new_cell.style.number_format.format_code = 'mm/dd/yyyy'.我找不到任何可以设置的文档,这更像是一个反复试验的案例!

最后,不要忘记这个例子是保存原文.您可以更改适用的保存路径以避免这种情况.

    import openpyxl

    wb = openpyxl.load_workbook(file)
    old_sheet = wb.get_sheet_by_name('Sheet1')
    old_sheet.title = 'Sheet1.5'
    max_row = old_sheet.get_highest_row()
    max_col = old_sheet.get_highest_column()
    wb.create_sheet(0, 'Sheet1')

    new_sheet = wb.get_sheet_by_name('Sheet1')

    # Do the header.
    for col_num in range(0, max_col):
        new_sheet.cell(row=0, column=col_num).value = old_sheet.cell(row=0, column=col_num).value

    # The row to be inserted. We're manually populating each cell.
    new_sheet.cell(row=1, column=0).value = 'DUMMY'
    new_sheet.cell(row=1, column=1).value = 'DUMMY'

    # Now do the rest of it. Note the row offset.
    for row_num in range(1, max_row):
        for col_num in range (0, max_col):
            new_sheet.cell(row = (row_num + 1), column = col_num).value = old_sheet.cell(row = row_num, column = col_num).value

    wb.save(file)
Run Code Online (Sandbox Code Playgroud)


ane*_*oid 7

添加适用于更新版本v2.5 +的答案openpyxl:

现在有一个insert_rows()insert_cols().

insert_rows(idx, amount=1)

在行== idx之前插入一行或多行

  • 遗憾的是它不支持合并单元格(它们保持固定在行位置) (3认同)

Rej*_*ted 5

Openpyxl工作表在执行行级或列级操作时功能有限.唯一特性的工作表具有涉及的行/列是属性row_dimensionscolumn_dimensions,其存储"RowDimensions"和分别用于每个行和列,"ColumnDimensions"对象.这些词典也用于get_highest_row()和... 一样的功能get_highest_column().

其他所有操作都在单元级别上运行,在字典中跟踪Cell对象_cells(并且在字典中跟踪它们的样式_styles).大多数看起来像在行或列级别上执行任何操作的函数实际上都在一系列单元格上运行(例如前面提到的append()).

最简单的方法是建议:创建新工作表,追加标题行,追加新数据行,追加旧数据行,删除旧工作表,然后将新工作表重命名为旧工作表.使用此方法可能出现的问题是行/列维度属性和单元格样式的丢失,除非您专门复制它们.

或者,您可以创建自己的插入行或列的函数.

我有大量非常简单的工作表,我需要从中删除列.既然您要求提供明确的示例,我将提供快速汇总的功能来执行此操作:

from openpyxl.cell import get_column_letter

def ws_delete_column(sheet, del_column):

    for row_num in range(1, sheet.get_highest_row()+1):
        for col_num in range(del_column, sheet.get_highest_column()+1):

            coordinate = '%s%s' % (get_column_letter(col_num),
                                   row_num)
            adj_coordinate = '%s%s' % (get_column_letter(col_num + 1),
                                       row_num)

            # Handle Styles.
            # This is important to do if you have any differing
            # 'types' of data being stored, as you may otherwise get
            # an output Worksheet that's got improperly formatted cells.
            # Or worse, an error gets thrown because you tried to copy
            # a string value into a cell that's styled as a date.

            if adj_coordinate in sheet._styles:
                sheet._styles[coordinate] = sheet._styles[adj_coordinate]
                sheet._styles.pop(adj_coordinate, None)
            else:
                sheet._styles.pop(coordinate, None)

            if adj_coordinate in sheet._cells:
                sheet._cells[coordinate] = sheet._cells[adj_coordinate]
                sheet._cells[coordinate].column = get_column_letter(col_num)
                sheet._cells[coordinate].row = row_num
                sheet._cells[coordinate].coordinate = coordinate

                sheet._cells.pop(adj_coordinate, None)
            else:
                sheet._cells.pop(coordinate, None)

        # sheet.garbage_collect()
Run Code Online (Sandbox Code Playgroud)

我传递了我正在使用的工作表,以及我想要删除的列号,然后就可以了.我知道这不是你想要的,但我希望这些信息有所帮助!

编辑:注意到有人给了另一个投票,并认为我应该更新它.Openpyxl中的坐标系统在过去的几年中经历了一些变化,coordinate为项目引入了一个属性_cell.这也需要编辑,或者行将留空(而不是删除),Excel将抛出有关文件问题的错误.这适用于Openpyxl 2.2.3(未经测试的更高版本)


Pre*_*cks 5

从 openpyxl 1.5 开始,您现在可以使用 .insert_rows(idx, row_qty)

from openpyxl import load_workbook
wb = load_workbook('excel_template.xlsx')
ws = wb.active
ws.insert_rows(14, 10)
Run Code Online (Sandbox Code Playgroud)

如果您在 Excel 中手动执行此操作,它不会选择 idx 行的格式。之后您将应用正确的格式,即单元格颜色。