Python-docx：识别段落中的分页符

Question

Python-docx：识别段落中的分页符

Igo*_*kin 0 python search page-break python-docx

我逐段迭代文档，然后我将每个段落文本拆分为句子.（带空格的点）。与在整个段落文本中搜索相比，我将句子中的段落文本拆分为更有效的文本搜索。

然后代码在句子的每个单词中搜索错误，错误来自纠错数据库。我在下面展示了一个简化的代码：

from docx.enum.text import WD_BREAK

for paragraph in document.paragraphs:
    sentences = paragraph.text.split('. ') 
    for sentence in sentences:
        words=sentence.split(' ')
        for word in words:
            for error in error_dictionary:
                 if error in word:
                     # (A) make simple replacement
                     word = word.replace(error, correction, 1)
                     # (B) alternative replacement based on runs 
                     for run in paragraph.runs:
                         if error in run.text:
                               run.text = run.text.replace(error, correction, 1)
                         # here we may fetch page break attribute and knowing current number 
                         # find out at what page the replacement has taken place 
                         if run.page_break== WD_BREAK:
                              current_page_number +=1
                     replace_counter += 1
                     # write to a report what paragraph and what page
                     write_report(error, correction, sentence, current_page_number )  
                     # for that I need to know a page break

Run Code Online (Sandbox Code Playgroud)

问题是如何识别运行（或其他段落元素）是否包含分页符？不run.page_break == WD_BREAK工作？@scanny 已经展示了如何添加分页符，但如何识别它？

最好的情况是，如果您还可以识别段落中的换行符。

我可以：

for run in paragraph.runs:
    if run._element.br_lst:             
        for br in run._element.br_lst:
            br_couter+=1
            print br.type

Run Code Online (Sandbox Code Playgroud)

然而，此代码仅显示Hard break，即通过Ctrl+Enter插入的中断。软分页符不检测...（软分页符，当用户保持打字，直到页面他是在用完然后流入到下一个页面形成）

任何提示？

Answer 1

Igo*_*kin 6

对于软分页符和硬分页符，我现在使用以下内容：

for run in paragraph.runs:
    if 'lastRenderedPageBreak' in run._element.xml:  
        print 'soft page break found at run:', run.text[:20] 
    if 'w:br' in run._element.xml and 'type="page"' in run._element.xml:
        print 'hard page break found at run:', run.text[:20]

Run Code Online (Sandbox Code Playgroud)

归档时间：	6 年，10 月前
查看次数：	4897 次
最近记录：	6 年，9 月前