使用 python-docx 搜索目录中的所有 docx 文件(批处理)

JTa*_*lor 2 python docx python-docx

我有一堆docx具有相同嵌入 Excel 表格的 Word文件。我正在尝试从多个文件中提取相同的单元格。

我想出了如何硬编码到一个文件:

from docx import Document

document = Document(r"G:\GIS\DESIGN\ROW\ROW_Files\Docx\006-087-003.docx")
table = document.tables[0]
Project_cell = table.rows[2].cells[2]
paragraph = Project_cell.paragraphs[0]
Project = paragraph.text

print Project
Run Code Online (Sandbox Code Playgroud)

但是我该如何批处理呢?我在 上尝试了一些变体listdir,但它们对我不起作用,而且我太绿了,无法独自到达那里。

Rej*_*ted 5

您如何循环所有文件实际上取决于您的项目可交付成果。所有文件都在一个文件夹中吗?不仅仅是.docx文件吗?

为了解决所有问题,我们假设有子目录和其他文件与您的.docx文件混合在一起。为此,我们将使用os.walk()os.path.splitext()

import os

from docx import Document

# First, we'll create an empty list to hold the path to all of your docx files
document_list = []       

# Now, we loop through every file in the folder "G:\GIS\DESIGN\ROW\ROW_Files\Docx" 
# (and all it's subfolders) using os.walk().  You could alternatively use os.listdir()
# to get a list of files.  It would be recommended, and simpler, if all files are
# in the same folder.  Consider that change a small challenge for developing your skills!
for path, subdirs, files in os.walk(r"G:\GIS\DESIGN\ROW\ROW_Files\Docx"): 
    for name in files:
        # For each file we find, we need to ensure it is a .docx file before adding
        #  it to our list
        if os.path.splitext(os.path.join(path, name))[1] == ".docx":
            document_list.append(os.path.join(path, name))

# Now create a loop that goes over each file path in document_list, replacing your 
# hard-coded path with the variable.
for document_path in document_list:
    document = Document(document_path)        # Change the document being loaded each loop
    table = document.tables[0]
    project_cell = table.rows[2].cells[2]
    paragraph = project_cell.paragraphs[0]
    project = paragraph.text

    print project
Run Code Online (Sandbox Code Playgroud)

如需更多阅读,这里是有关os.listdir().

此外,最好将您的代码放入可重用的函数中,但这对您自己来说也是一个挑战!