use*_*750 4 python python-docx
例如:请在word文档中查找以下段落。这些段落位于表格内。
我正在尝试用“唤醒”代替“获取”。我正在寻找仅在第 1 段的情况下用“wake”替换“get”。但在下面给出的代码中,它在两个段落中都被替换,如下面的结果所示。此行为对于 Word 文档中的所有段落都是相同的。请建议按照上述要求进行工作。
实际结果: 1. 好吧,大家醒醒吧。2. 好吧,大家醒醒吧。
doc = docx.Document("path/docss.docx")
def Search_replace_text():
for table in doc.tables:
for row in table.rows:
for cell in row.cells:
for paragraph in cell.paragraphs:
for run in paragraph.runs:
if str(word.get()) in run.text:
text = run.text.split(str(word.get())) # Gets input from GUI
if text[1] == " ":
run.text = text[0] + str(replace.get()) # Gets input from GUI
print(run.text)
else:
run.text = text[0] + str(replace.get()) + text[1]
else: break
doc.save("docss.docx")
Run Code Online (Sandbox Code Playgroud)
我想要的结果如下图所示:
好吧,伙计们,请醒来。
好的,请起床。
实际结果:
好吧,伙计们,请醒来。
好吧,伙计们,请醒来。
在运行中替换文本的问题在于,文本可能会被拆分为多个运行,这意味着简单的文本查找和替换并不总是有效。
将我的答案调整为Python docx Replace string in paragraph while保持风格
要替换的文本可以分为多个运行,因此需要通过部分匹配进行搜索,识别哪些运行需要替换文本,然后替换所识别的文本。
此函数替换字符串并保留原始文本样式。无论是否需要保留样式,此过程都是相同的,因为正是样式导致文本可能被分成多个运行,即使文本在视觉上缺乏样式也是如此。
import docx
def docx_find_replace_text(doc, search_text, replace_text):
paragraphs = list(doc.paragraphs)
for t in doc.tables:
for row in t.rows:
for cell in row.cells:
for paragraph in cell.paragraphs:
paragraphs.append(paragraph)
for p in paragraphs:
if search_text in p.text:
inline = p.runs
# Replace strings and retain the same style.
# The text to be replaced can be split over several runs so
# search through, identify which runs need to have text replaced
# then replace the text in those identified
started = False
search_index = 0
# found_runs is a list of (inline index, index of match, length of match)
found_runs = list()
found_all = False
replace_done = False
for i in range(len(inline)):
# case 1: found in single run so short circuit the replace
if search_text in inline[i].text and not started:
found_runs.append((i, inline[i].text.find(search_text), len(search_text)))
text = inline[i].text.replace(search_text, str(replace_text))
inline[i].text = text
replace_done = True
found_all = True
break
if search_text[search_index] not in inline[i].text and not started:
# keep looking ...
continue
# case 2: search for partial text, find first run
if search_text[search_index] in inline[i].text and inline[i].text[-1] in search_text and not started:
# check sequence
start_index = inline[i].text.find(search_text[search_index])
check_length = len(inline[i].text)
for text_index in range(start_index, check_length):
if inline[i].text[text_index] != search_text[search_index]:
# no match so must be false positive
break
if search_index == 0:
started = True
chars_found = check_length - start_index
search_index += chars_found
found_runs.append((i, start_index, chars_found))
if search_index != len(search_text):
continue
else:
# found all chars in search_text
found_all = True
break
# case 2: search for partial text, find subsequent run
if search_text[search_index] in inline[i].text and started and not found_all:
# check sequence
chars_found = 0
check_length = len(inline[i].text)
for text_index in range(0, check_length):
if inline[i].text[text_index] == search_text[search_index]:
search_index += 1
chars_found += 1
else:
break
# no match so must be end
found_runs.append((i, 0, chars_found))
if search_index == len(search_text):
found_all = True
break
if found_all and not replace_done:
for i, item in enumerate(found_runs):
index, start, length = [t for t in item]
if i == 0:
text = inline[index].text.replace(inline[index].text[start:start + length], str(replace_text))
inline[index].text = text
else:
text = inline[index].text.replace(inline[index].text[start:start + length], '')
inline[index].text = text
# print(p.text)
# sample usage as per example
doc = docx.Document('find_replace_test_document.docx')
docx_find_replace_text(doc, 'Testing1', 'Test ')
docx_find_replace_text(doc, 'Testing2', 'Test ')
docx_find_replace_text(doc, 'rest', 'TEST')
doc.save('find_replace_test_result.docx')
Run Code Online (Sandbox Code Playgroud)
以下是一些屏幕截图,显示了源文档以及替换文本后的结果:
'Testing1' -> 'Test '
'Testing2' -> 'Test '
'rest' -> 'TEST'
Run Code Online (Sandbox Code Playgroud)
源文件:
结果文件:
我希望这可以帮助别人。
| 归档时间: |
|
| 查看次数: |
8942 次 |
| 最近记录: |