众所周知,单词的查找和替换“通配符”功能受到一些严重的限制。
以下代码在 word 文档中实现了真正的正则表达式查找和替换,在其他 Stackoverflow 和 SuperUser 问题中可以找到它的变体。
Sub RegEx_PlainText(Before As String, After As String)
Dim regexp As Object
Set regexp = CreateObject("vbscript.regexp")
With regexp
.Pattern = Before
.IgnoreCase = True
.Global = True
'could be any Range , .Range.Text , or selection object
ActiveDocument.Range = .Replace(ActiveDocument.Range, After)
End With
End Sub
Run Code Online (Sandbox Code Playgroud)
但是,这会擦除所有格式的文档。
即使字符串的长度相同或实际上是相同的字符串,Word 也不会逐个字符地保留格式,因此ActiveDocument.Range = ActiveDocument.RangeorSelection.Text=Selection.Text将擦除所有格式(或更准确地说,将整个范围的格式设置为与范围中的第一个字符相同,并添加回车)。仔细想想,这种行为并不令人惊讶。
为了解决这个问题,下面的代码运行一个正则表达式查找,然后遍历匹配项并.replace仅在找到匹配项的范围内运行。 这样,如果匹配本身具有多种格式(例如,斜体字会丢失),则只会丢失格式
希望代码注释使这非常透明。
Sub RegEx(Before As String, After As String, _
Optional CaseSensitive As Boolean = False, _
Optional Location As Range = Nothing, _
Optional DebugMode As Boolean = False)
'can't declare activedocument.range in parameters
If Location Is Nothing Then Set Location = ActiveDocument.Range
Dim regexp As Object
Dim Foundmatches As Object
Dim Match As Object
Dim MatchRange As Range
Dim offset As Integer: offset = 0
Set regexp = CreateObject("vbscript.regexp")
With regexp
.Pattern = Before
.IgnoreCase = Not CaseSensitive
.Global = True
'set foundmatches to collection of all regex matches
Set Foundmatches = .Execute(Location.text)
For Each Match In Foundmatches
'set matchrange to location of found string in source doc.
'offset accounts for change in length of document from already completed replacements
Set MatchRange = Location.Document _
.Range(Match.FirstIndex + offset, _
Match.FirstIndex + Match.Length + offset)
'debugging
If DebugMode Then
Debug.Print "strfound = " & Match.Value
Debug.Print "matchpoint = " & Match.FirstIndex
Debug.Print "origstrlength = " & Match.Length
Debug.Print "offset = " & offset
Debug.Print "matchrange = " & MatchRange.text
MatchRange.Select
Stop
Else
'REAL LIFE
'run the regex replace just on the range containing the regex match
MatchRange = .Replace(MatchRange, After)
'increment offset to account for change in length of document
offset = offset + MatchRange.End - MatchRange.Start - Match.Length
End If
Next
End With
End Sub
Run Code Online (Sandbox Code Playgroud)
这适用于简单的文档,但是当我在真实文档上运行它时,matchrange最终会在靠近找到匹配的某个点处,但并不完全正确。 它不是可预见的关闭,有时它在右边,有时在左边。一般文档越复杂。(链接、上下文表、格式等)结果越错误。
有谁知道为什么这不起作用,以及如何解决它? 如果我能理解为什么这不起作用,那么我也许能够确定这种方法是否可以修复,或者我是否只需要尝试不同的方法。
代码包含 DebugMode 参数,这意味着它只会遍历文档并突出显示所有匹配项,不执行任何更改。 还向控制台输出一堆东西。这应该对任何愿意和我一起解决这个问题的人都有帮助。
https://calibre-ebook.com/downloads/demos/demo.docx这是一个可能有用的示例文档(不是我制作的)。
@Some_Guy:感谢您提出这个问题,我遇到了类似的问题,您的帖子节省了我很多时间。
这是我想出的拼凑:
Sub RegEx(Before As String, After As String, _
Optional CaseSensitive As Boolean = False, _
Optional Location As Range = Nothing, _
Optional DebugMode As Boolean = False)
'can't declare activedocument.range in parameters
If Location Is Nothing Then Set Location = ActiveDocument.Range
Dim j As Long
Dim regexp As Object
Dim Foundmatches As Object
Dim Match As Object
Dim MatchRange As Range
Dim offset As Integer: offset = 0
Set regexp = CreateObject("vbscript.regexp")
With regexp
.Pattern = Before
.IgnoreCase = Not CaseSensitive
.Global = True
'set foundmatches to collection of all regex matches
Set Foundmatches = .Execute(Location.Text)
For j = Foundmatches.Count - 1 To 0 Step -1
If DebugMode = True Then
'debugging
Debug.Print Foundmatches(j), .Replace(Foundmatches(j), After)
Else
'REAL LIFE
'run a plain old find/replace on the found string and eplace strings
With ActiveDocument.Range.Find
.ClearFormatting
.Replacement.ClearFormatting
.Replacement.Font.Hidden = True
.Text = Foundmatches(j)
.Replacement.Text = regexp.Replace(Foundmatches(j), After)
.Execute Replace:=wdReplaceAll
End With
End If
Next j
End With
End Sub
Run Code Online (Sandbox Code Playgroud)
基本上我使用一个简单的查找/替换,其中的字符串与找到的每个项目(并将被替换)与正则表达式相匹配,在Word中会对其提供适当的支持)。请注意,任何替换的文本都会采用第一个替换字符的格式,因此如果第一个单词为粗体,则所有替换的文本都将为粗体。