在 Python 中使用 OpenPyXL 包写入数据后如何保持样式格式不变?

Hou*_*run 5 python-2.7 openpyxl

我正在使用openpyxl库包来读取和写入一些数据到现有的 excel 文件test.xlsx

在写入一些数据之前,文件的内容如下所示:

在此处输入图片说明

  • 单元格 A1 包含高棉 Unicode 字符,英文字符为粗体

  • 单元格A3使用了lemons1 font-face,英文字体为斜体

我正在使用下面的脚本将数据“It is me”读取和写入此 excel 文件的单元格 B2:

from openpyxl import load_workbook
import os
FILENAME1 = os.path.dirname(__file__)+'/test.xlsx'
from flask import make_response
from openpyxl.writer.excel import save_virtual_workbook
from app import app

@app.route('/testexel', methods=['GET'])
def testexel():
    with app.app_context():
        try:
            filename = 'test'
            workbook = load_workbook(FILENAME1, keep_links=False)
            sheet = workbook['Sheet1']
            sheet['B2']='It is me'

            response = make_response(save_virtual_workbook(workbook))
            response.headers['Cache-Control'] = 'no-cache'
            response.headers["Content-Disposition"] = "attachment; filename=%s.xlsx" % filename
            response.headers["Content-type"] = "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet; charset=utf-8"

            return response

        except Exception as e:

            raise
Run Code Online (Sandbox Code Playgroud)

然后结果 excel 文件的格式被修改为这样,我从不希望它是这样的:

在此处输入图片说明

写入数据之前的格式样式与原始文件有很大不同:

  • 单元格A1所有数据都是粗体采用英文字符的样式格式

  • 单元格B3英文字符变为正常样式,字体limons1从前面的高棉字符更改为字体。

我想要完成的是将文件的现有内容保持与原来相同的格式(样式和字体),同时向其中写入额外的数据。

请告知我的脚本有什么问题,以及如何在运行上述脚本后保持现有样式和字体不变?谢谢。

and*_*nds 7

Excel 文件(扩展名为 .xlsx)实际上是 zip 存档。(你实际上可以用 7-zip 或一些类似的程序打开 excel 文件。)所以 excel 文件包含一堆 xml 文件,其中存储了数据。openpyxl 的作用是在打开 excel 文件时从这些 xml 文件中读取数据,并在保存 excel 文件时使用 xml 文件创建 zip 存档。很遗憾,openpyxl 读取一些 xml 文件,然后解析该数据,然后您可以使用 openpyxl 库中的函数来更改和添加数据,最后当您保存工作簿时,openpyxl 将创建 xml 文件,将数据写入其中,然后保存它们作为 zip 存档(这是 excel 文件)。这些 xml 文件包含存储在 excel 文件中的所有数据,(一个 xml 文件包含来自 excel 文件的公式,另一个将包含样式,其他将包含有关 excel 主题的数据等等)。

  • sharedStrings.xml

    此文件包含 excel 文件中的所有字符串以及这些字符串的格式,这是一个示例:

    <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
     <sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" count="1" uniqueCount="1">
         <si>
             <r>
                 <rPr>
                     <b/>
                     <sz val="22"/>
                     <color theme="1"/>
                     <rFont val="Calibri"/>
                     <family val="2"/>
                     <scheme val="minor"/>
                 </rPr>
                 <t>Hello</t>
             </r>
             <r>
                 <rPr>
                     <sz val="22"/>
                     <color theme="1"/>
                     <rFont val="Calibri"/>
                     <family val="2"/>
                     <scheme val="minor"/>
                 </rPr>
                 <t xml:space="preserve"> ?</t>
             </r>
         </si>
     </sst>
    
    Run Code Online (Sandbox Code Playgroud)
  • sheet1.xml

    此文件包含字符串的位置(哪个单元格包含哪个字符串)。(您的 excel 文件中的每个工作表将有一个文件,但假设您的文件中只有一张工作表,用于此示例。)这是一个示例:

     <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
     <worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" mc:Ignorable="x14ac xr xr2 xr3" xmlns:x14ac="http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac" xmlns:xr="http://schemas.microsoft.com/office/spreadsheetml/2014/revision" xmlns:xr2="http://schemas.microsoft.com/office/spreadsheetml/2015/revision2" xmlns:xr3="http://schemas.microsoft.com/office/spreadsheetml/2016/revision3" xr:uid="{00000000-0001-0000-0000-000000000000}">
         <dimension ref="A1:C3"/>
         <sheetViews>
             <sheetView tabSelected="1" zoomScaleNormal="100" workbookViewId="0">
                 <selection activeCell="A3" sqref="A3"/>
             </sheetView>
         </sheetViews>
         <sheetFormatPr defaultRowHeight="15" x14ac:dyDescent="0.25"/>
         <cols>
             <col min="1" max="1" width="20.140625" customWidth="1"/>
             <col min="2" max="2" width="10.7109375" customWidth="1"/>
         </cols>
         <sheetData>
             <row r="1" spans="1:3" ht="60.75" customHeight="1" x14ac:dyDescent="0.45">
                 <c r="A1" s="4" t="s">
                     <v>0</v>
                 </c>
             </row>
             <row r="2" spans="1:3" ht="19.5" customHeight="1" x14ac:dyDescent="0.35">
                 <c r="A2" s="1"/>
                 <c r="B2" s="3"/>
             </row>
             <row r="3" spans="1:3" ht="62.25" customHeight="1" x14ac:dyDescent="0.5">
                 <c r="A3" s="5" t="s">
                     <v>1</v>
                 </c>
                 <c r="C3" s="2"/>
             </row>
         </sheetData>
         <pageMargins left="0.75" right="0.75" top="1" bottom="1" header="0.5" footer="0.5"/>
         <pageSetup paperSize="9" orientation="portrait" r:id="rId1"/>
     </worksheet>
    
    Run Code Online (Sandbox Code Playgroud)

如果你用 openpyxl 打开这个 excel 并保存它(不更改任何数据),这sharedStrings.xml将是这样的:

<sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" uniqueCount="2">
    <si>
        <t>Hello &#6031;</t>
    </si>
    <si>
        <t>&#6030; sike</t>
    </si>
</sst>
Run Code Online (Sandbox Code Playgroud)

如您所见,您将丢失所有单元格(字符串)原始格式,取而代之的是您的单元格会获得某种合并格式(因此,如果单元格中的某些字符是粗体而有些不是,那么当您保存文件时,要么是整个单元格将是粗体或整个单元格将是正常的)。现在人们已经要求开发人员实现这个富文本选项(link1link2),但他们感到遗憾的是,实现这样的东西会很复杂。我同意这不容易做到,但我们可以做一些更简单的事情:我们可以从sharedStrings.xml打开 excel 文件时获取数据,而不是在我们想要保存 excel 文件时使用该 xml 代码但仅适用于已存在的单元格当我们打开文件时。这可能不容易理解,所以让我们看看下面的例子:

假设您有这样的 excel 文件: 在此处输入图片说明

对于这个excel文件,sharedStrings.xml将是这样的:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" count="1" uniqueCount="1">
    <si>
        <r>
            <rPr>
                <b/>
                <sz val="22"/>
                <color theme="1"/>
                <rFont val="Calibri"/>
                <family val="2"/>
                <scheme val="minor"/>
            </rPr>
            <t>Hello</t>
        </r>
        <r>
            <rPr>
                <sz val="22"/>
                <color theme="1"/>
                <rFont val="Calibri"/>
                <family val="2"/>
                <scheme val="minor"/>
            </rPr>
            <t xml:space="preserve"> ?</t>
        </r>
    </si>
</sst>
Run Code Online (Sandbox Code Playgroud)

如果您运行此 python 代码:

from openpyxl import load_workbook
workbook = load_workbook(FILENAME1, keep_links=False)
sheet = workbook.active
sheet['A2'] = 'It is me'
workbook.save('out.xlsx')
Run Code Online (Sandbox Code Playgroud)

文件out.xlsx将如下所示:

在此处输入图片说明

对于out.xlsx文件,sharedStrings.xml 将是这样的:

<sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" uniqueCount="2">
    <si>
        <t>Hello &#6031;</t>
    </si>
    <si>
        <t>It is me</t>
    </si>
</sst>
Run Code Online (Sandbox Code Playgroud)

所以我们要做的就是使用这个xml代码:

<si>
    <r>
        <rPr>
            <b/>
            <sz val="22"/>
            <color theme="1"/>
            <rFont val="Calibri"/>
            <family val="2"/>
            <scheme val="minor"/>
        </rPr>
        <t>Hello</t>
    </r>
    <r>
        <rPr>
            <sz val="22"/>
            <color theme="1"/>
            <rFont val="Calibri"/>
            <family val="2"/>
            <scheme val="minor"/>
        </rPr>
        <t xml:space="preserve"> ?</t>
    </r>
</si>
Run Code Online (Sandbox Code Playgroud)

对于包含Hello ?此 xml 代码的旧单元格 A1 :

<si>
    <t>It is me</t>
</si>
Run Code Online (Sandbox Code Playgroud)

对于包含It is me.

所以我们可以结合这个 xml 部分来得到这样的 xml 文件:

<sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" uniqueCount="2">
    <si>
        <r>
            <rPr>
                <b/>
                <sz val="22"/>
                <color theme="1"/>
                <rFont val="Calibri"/>
                <family val="2"/>
                <scheme val="minor"/>
            </rPr>
            <t>Hello</t>
        </r>
        <r>
            <rPr>
                <sz val="22"/>
                <color theme="1"/>
                <rFont val="Calibri"/>
                <family val="2"/>
                <scheme val="minor"/>
            </rPr>
            <t xml:space="preserve"> ?</t>
        </r>
    </si>
    <si>
        <t>It is me</t>
    </si>
</sst>
Run Code Online (Sandbox Code Playgroud)

我写了一些函数来做到这一点。(有相当多的代码,但其中大部分只是从 openpyxl 复制的。如果您要更改 openpyxl 库,您可以在 10 或 20 行代码中完成此操作,但这绝不是一个好主意,所以我宁愿复制我的整个函数需要改变然后改变那一小部分。)

您可以将以下代码保存在单独的文件中extendedopenpyxl.py

from openpyxl import load_workbook as openpyxlload_workbook
from openpyxl.reader.excel import _validate_archive, _find_workbook_part
from openpyxl.reader.worksheet import _get_xml_iter
from openpyxl.xml.functions import fromstring, iterparse, safe_iterator, tostring, Element, xmlfile, SubElement
from openpyxl.xml.constants import ARC_CONTENT_TYPES, SHEET_MAIN_NS, SHARED_STRINGS, ARC_ROOT_RELS, ARC_APP, ARC_CORE, ARC_THEME, ARC_SHARED_STRINGS, ARC_STYLE, ARC_WORKBOOK, ARC_WORKBOOK_RELS
from openpyxl.packaging.manifest import Manifest
from openpyxl.packaging.relationship import get_dependents, get_rels_path
from openpyxl.packaging.workbook import WorkbookParser
from openpyxl.packaging.extended import ExtendedProperties
from openpyxl.utils import coordinate_to_tuple
from openpyxl.cell.text import Text
from openpyxl.writer.excel import ExcelWriter as openpyxlExcelWriter
from openpyxl.writer.workbook import write_root_rels, write_workbook_rels, write_workbook
from openpyxl.writer.theme import write_theme
from openpyxl.writer.etree_worksheet import get_rows_to_write
from openpyxl.styles.stylesheet import write_stylesheet
from zipfile import ZipFile, ZIP_DEFLATED
from operator import itemgetter
from io import BytesIO
from xml.etree.ElementTree import tostring as xml_tostring
from xml.etree.ElementTree import register_namespace
from lxml.etree import fromstring as lxml_fromstring

register_namespace('', 'http://schemas.openxmlformats.org/spreadsheetml/2006/main')

def get_value_cells(workbook):
    value_cells = []
    for idx, worksheet in enumerate(workbook.worksheets, 1):
        all_rows = get_rows_to_write(worksheet)
        for row_idx, row in all_rows:
            row = sorted(row, key=itemgetter(0))
            for col, cell in row:
                if cell._value is not None:
                    if cell.data_type == 's':
                        value_cells.append((worksheet.title,(cell.row, cell.col_idx)))
    return value_cells

def check_if_lxml(element):
    if type(element).__module__ == 'xml.etree.ElementTree':
        string = xml_tostring(element)
        el = lxml_fromstring(string)
        return el
    return element

def write_string_table(workbook):
    string_table = workbook.shared_strings
    workbook_data = workbook.new_interal_value_workbook_data
    data_strings = workbook.new_interal_value_data_strings
    value_cells = get_value_cells(workbook)
    out = BytesIO()
    i = 0
    with xmlfile(out) as xf:
        with xf.element("sst", xmlns=SHEET_MAIN_NS, uniqueCount="%d" % len(string_table)):            
            for i, key in enumerate(string_table):
                sheetname, coordinates = value_cells[i]
                if coordinates in workbook_data[sheetname]:
                    value = workbook_data[sheetname][coordinates]
                    xml_el = data_strings[value]
                    el = check_if_lxml(xml_el)
                else:
                    el = Element('si')
                    text = SubElement(el, 't')
                    text.text = key
                    if key.strip() != key:
                        text.set(PRESERVE_SPACE, 'preserve')
                xf.write(el)

    return  out.getvalue()


class ExcelWriter(openpyxlExcelWriter):

    def write_data(self):
        """Write the various xml files into the zip archive."""
        # cleanup all worksheets
        archive = self._archive

        archive.writestr(ARC_ROOT_RELS, write_root_rels(self.workbook))
        props = ExtendedProperties()
        archive.writestr(ARC_APP, tostring(props.to_tree()))

        archive.writestr(ARC_CORE, tostring(self.workbook.properties.to_tree()))
        if self.workbook.loaded_theme:
            archive.writestr(ARC_THEME, self.workbook.loaded_theme)
        else:
            archive.writestr(ARC_THEME, write_theme())

        self._write_worksheets()
        self._write_chartsheets()
        self._write_images()
        self._write_charts()

        string_table_out = write_string_table(self.workbook)
        self._archive.writestr(ARC_SHARED_STRINGS, string_table_out)
        self._write_external_links()

        stylesheet = write_stylesheet(self.workbook)
        archive.writestr(ARC_STYLE, tostring(stylesheet))

        archive.writestr(ARC_WORKBOOK, write_workbook(self.workbook))
        archive.writestr(ARC_WORKBOOK_RELS, write_workbook_rels(self.workbook))

        self._merge_vba()

        self.manifest._write(archive, self.workbook)

        return

    def save(self, filename):
        self.write_data()
        self._archive.close()
        return
    

def get_coordinates(cell, row_count, col_count):
    coordinate = cell.get('r')
    if coordinate:
        row, column = coordinate_to_tuple(coordinate)
    else:
        row, column = row_count, col_count
    return row, column

def parse_cell(cell):
    VALUE_TAG = '{%s}v' % SHEET_MAIN_NS
    value = cell.find(VALUE_TAG)
    if value is not None:
        value = int(value.text)
    return value

def parse_row(row, row_count):
    CELL_TAG = '{%s}c' % SHEET_MAIN_NS
    if row.get('r'):
        row_count = int(row.get('r'))
    else:
        row_count += 1
    col_count = 0
    data = dict()
    for cell in safe_iterator(row, CELL_TAG):
        col_count += 1
        value = parse_cell(cell)
        if value is not None:
            coordinates = get_coordinates(cell, row_count, col_count)
            data[coordinates] = value
    return data


def parse_sheet(xml_source):
    dispatcher = ['{%s}mergeCells' % SHEET_MAIN_NS, '{%s}col' % SHEET_MAIN_NS, '{%s}row' % SHEET_MAIN_NS, '{%s}conditionalFormatting' % SHEET_MAIN_NS, '{%s}legacyDrawing' % SHEET_MAIN_NS, '{%s}sheetProtection' % SHEET_MAIN_NS, '{%s}extLst' % SHEET_MAIN_NS, '{%s}hyperlink' % SHEET_MAIN_NS, '{%s}tableParts' % SHEET_MAIN_NS]
    row_count = 0
    stream = _get_xml_iter(xml_source)
    it = iterparse(stream, tag=dispatcher)
    row_tag = '{%s}row' % SHEET_MAIN_NS
    data = dict()
    for _, element in it:
        tag_name = element.tag
        if tag_name == row_tag:
            row_data = parse_row(element, row_count)
            data.update(row_data)
            element.clear()
    return data


def get_workbook_parser(archive):
    src = archive.read(ARC_CONTENT_TYPES)
    root = fromstring(src)
    package = Manifest.from_tree(root)
    wb_part = _find_workbook_part(package)
    workbook_part_name = wb_part.PartName[1:]
    parser = WorkbookParser(archive, workbook_part_name)
    parser.parse()
    return parser, package

def get_data_strings(xml_source):
    STRING_TAG = '{%s}si' % SHEET_MAIN_NS
    strings = []
    src = _get_xml_iter(xml_source)
    for _, node in iterparse(src):
        if node.tag == STRING_TAG:
            strings.append(node)

    return strings

def load_workbook(filename, *args, **kwargs):
    workbook = openpyxlload_workbook(filename, *args, **kwargs)

    archive = _validate_archive(filename)
    parser, package = get_workbook_parser(archive)

    workbook_data = dict()
    for sheet, rel in parser.find_sheets():
        sheet_name = sheet.name
        worksheet_path = rel.target
        fh = archive.open(worksheet_path)
        sheet_data = parse_sheet(fh)
        workbook_data[sheet_name] = sheet_data

    data_strings = []
    ct = package.find(SHARED_STRINGS)
    if ct is not None:
        strings_path = ct.PartName[1:]
        strings_source = archive.read(strings_path)
        data_strings = get_data_strings(strings_source)

    workbook.new_interal_value_workbook_data = workbook_data
    workbook.new_interal_value_data_strings = data_strings
    return workbook

def save_workbook(workbook, filename,):
    archive = ZipFile(filename, 'w', ZIP_DEFLATED, allowZip64=True)
    writer = ExcelWriter(workbook, archive)
    writer.save(filename)
    return True

def save_virtual_workbook(workbook,):
    temp_buffer = BytesIO()
    archive = ZipFile(temp_buffer, 'w', ZIP_DEFLATED, allowZip64=True)
    writer = ExcelWriter(workbook, archive)
    try:
        writer.write_data()
    finally:
        archive.close()
    virtual_workbook = temp_buffer.getvalue()
    temp_buffer.close()
    return virtual_workbook
Run Code Online (Sandbox Code Playgroud)

现在,如果您运行此代码:

from extendedopenpyxl import load_workbook, save_workbook

workbook = load_workbook(FILENAME1, keep_links=False)
sheet = workbook['Sheet']
sheet['A2'] = 'It is me'
save_workbook(workbook, 'out.xlsx')
Run Code Online (Sandbox Code Playgroud)

当我在上面示例中使用的 excel 文件上运行此代码时,我得到了以下结果:

在此处输入图片说明

正如您所看到的,单元格 A1 中的文本按原样格式化,((Hello以粗体显示,但?不是)。

编辑(在West 的 评论之后

如果您使用的是较新版本的openpyxl(高于2.5.14),那么上面的代码将不起作用,因为 openpyxl 完全改变了它在 excel 文件中存储值的方式。我修复了里面的部分代码extendedopenpyxl.py,以下代码应该适用于较新版本的openpyxl(我在 3.0.6 版上对其进行了测试):

from openpyxl.reader.excel import ExcelReader, _validate_archive
from openpyxl.xml.constants import SHEET_MAIN_NS, SHARED_STRINGS, ARC_SHARED_STRINGS, ARC_APP, ARC_CORE, ARC_THEME, ARC_STYLE, ARC_ROOT_RELS, ARC_WORKBOOK, ARC_WORKBOOK_RELS
from openpyxl.xml.functions import iterparse, xmlfile, tostring
from openpyxl.utils import coordinate_to_tuple
import openpyxl.cell._writer
from zipfile import ZipFile, ZIP_DEFLATED
from openpyxl.writer.excel import ExcelWriter
from io import BytesIO
from xml.etree.ElementTree import register_namespace
from xml.etree.ElementTree import tostring as xml_tostring
from lxml.etree import fromstring as lxml_fromstring
from openpyxl.worksheet._writer import WorksheetWriter
from openpyxl.workbook._writer import WorkbookWriter
from openpyxl.packaging.extended import ExtendedProperties
from openpyxl.styles.stylesheet import write_stylesheet
from openpyxl.packaging.relationship import Relationship
from openpyxl.cell._writer import write_cell
from openpyxl.drawing.spreadsheet_drawing import SpreadsheetDrawing
from openpyxl import LXML
from openpyxl.packaging.manifest import DEFAULT_OVERRIDE, Override, Manifest


DEFAULT_OVERRIDE.append(Override("/" + ARC_SHARED_STRINGS, SHARED_STRINGS))


def to_integer(value):
    if type(value) == int:
        return value
    if type(value) == str:
        try:
            num = int(value)
            return num
        except ValueError:
            num = float(value)
            if num.is_integer():
                return int(num)
    raise ValueError('Value {} is not an integer.'.format(value))
    return

def parse_cell(cell):
    VALUE_TAG = '{%s}v' % SHEET_MAIN_NS
    data_type = cell.get('t', 'n')
    value = None
    if data_type == 's':
        value = cell.findtext(VALUE_TAG, None) or None
        if value is not None:
            value = int(value)
    return value

def get_coordinates(cell, row_counter, col_counter):
    coordinate = cell.get('r')
    if coordinate:
        row, column = coordinate_to_tuple(coordinate)
   


Yvo*_*row 2

根据这个问题的答案,您可以使用openpyxl格式化Excel中的单元格。

那里给出的答案仅将目标单元格更改为粗体,但也许您可以将字体更改lemons1

from openpyxl.workbook import Workbook
from openpyxl.styles import Font
wb = Workbook()
ws = wb.active
ws['B3'] = "Hello"
ws['B3'].font =  Font(name='lemons1', size=14)
wb.save("FontDemo.xlsx")
Run Code Online (Sandbox Code Playgroud)

但是,根据文档,您只能将样式应用于整个单元格,而不能应用于单元格的一部分。因此,您需要将高棉语字符放入一个单元格中,将英文字符放入另一个单元格中。