Python docx row.cells 多次返回“合并”单元格

Question

Python docx row.cells 多次返回“合并”单元格

我正在使用 python docx 库，需要从文档中的表中读取数据。

虽然我可以使用以下代码读取数据，

document = Document(path_to_your_docx)
tables = document.tables
for table in tables:
    for row in table.rows:
        for cell in row.cells:
            for paragraph in cell.paragraphs:
                print(paragraph.text)

Run Code Online (Sandbox Code Playgroud)

我得到多个重复值，其中单元格中的内容跨越其合并的单元格，对于合并到其中的每个单元格一次。我不能简单地删除重复值，因为可能有多个未合并的单元格具有相同的值。我应该如何解决这个问题？

作为参考，我被指示从这个 github issue在这里提出问题。

谢谢你。

Answer 1

sca*_*nny 5

如果你想获得每个合并单元格一次，你可以添加以下代码：

def iter_unique_cells(row):
    """Generate cells in `row` skipping empty grid cells."""
    prior_tc = None
    for cell in row.cells:
        this_tc = cell._tc
        if this_tc is prior_tc:
            continue
        prior_tc = this_tc
        yield cell


document = Document(path_to_your_docx)
for table in document.tables:
    for row in table.rows:
        for cell in iter_unique_cells(row):
            for paragraph in cell.paragraphs:
                print(paragraph.text)

Run Code Online (Sandbox Code Playgroud)

您在表格中看到的同一单元格为它占据的每个“网格”单元格出现一次的行为是预期的行为。如果行单元格跨行不一致，则会导致其他地方出现问题，例如，如果 3 x 3 表中的每一行不一定包含 3 个单元格。例如，如果该行中存在合并的单元格，则访问三列表中的 row.cell[2] 将引发异常。

同时，有一个备用访问器可能很有用，也许Row.iter_unique_cells()这并不能保证跨行的一致性。这可能是一个值得要求的功能。

归档时间：	8 年，4 月前
查看次数：	2150 次
最近记录：	4 年，9 月前