不破坏文档格式的正则表达式 Microsoft Word

Som*_*Guy 6 regex vba ms-word

众所周知,单词的查找和替换“通配符”功能受到一些严重的限制。

以下代码在 word 文档中实现了真正的正则表达式查找和替换,在其他 Stackoverflow 和 SuperUser 问题中可以找到它的变体。

Sub RegEx_PlainText(Before As String, After As String)

    Dim regexp As Object
    Set regexp = CreateObject("vbscript.regexp")            

    With regexp
        .Pattern = Before
        .IgnoreCase = True
        .Global = True

         'could be any Range , .Range.Text , or selection object
         ActiveDocument.Range = .Replace(ActiveDocument.Range, After)

    End With
End Sub
Run Code Online (Sandbox Code Playgroud)

但是,这会擦除所有格式的文档。

即使字符串的长度相同或实际上是相同的字符串,Word 也不会逐个字符地保留格式,因此ActiveDocument.Range = ActiveDocument.RangeorSelection.Text=Selection.Text将擦除所有格式(或更准确地说,将整个范围的格式设置为与范围中的第一个字符相同,并添加回车)。仔细想想,这种行为并不令人惊讶。

为了解决这个问题,下面的代码运行一个正则表达式查找,然后遍历匹配项并.replace仅在找到匹配项的范围内运行。 这样,如果匹配本身具有多种格式(例如,斜体字会丢失),则只会丢失格式

希望代码注释使这非常透明。

Sub RegEx(Before As String, After As String, _
          Optional CaseSensitive As Boolean = False, _
          Optional Location As Range = Nothing, _
          Optional DebugMode As Boolean = False)

    'can't declare activedocument.range in parameters
    If Location Is Nothing Then Set Location = ActiveDocument.Range

    Dim regexp As Object
    Dim Foundmatches As Object
    Dim Match As Object
    Dim MatchRange As Range
    Dim offset As Integer: offset = 0
    Set regexp = CreateObject("vbscript.regexp")

   With regexp
        .Pattern = Before
        .IgnoreCase = Not CaseSensitive
        .Global = True

        'set foundmatches to collection of all regex matches
        Set Foundmatches = .Execute(Location.text)

        For Each Match In Foundmatches

            'set matchrange to location of found string in source doc.
            'offset accounts for change in length of  document from already completed replacements
            Set MatchRange = Location.Document _
                   .Range(Match.FirstIndex + offset, _
                          Match.FirstIndex + Match.Length + offset)

            'debugging
            If DebugMode Then
                    Debug.Print "strfound      = " & Match.Value
                    Debug.Print "matchpoint    = " & Match.FirstIndex
                    Debug.Print "origstrlength = " & Match.Length
                    Debug.Print "offset        = " & offset
                    Debug.Print "matchrange    = " & MatchRange.text
                    MatchRange.Select
                Stop

            Else
            'REAL LIFE
                'run the regex replace just on the range containing the regex match
                MatchRange = .Replace(MatchRange, After)

                'increment offset to account for change in length of document
                offset = offset + MatchRange.End - MatchRange.Start - Match.Length
            End If
        Next
    End With
End Sub
Run Code Online (Sandbox Code Playgroud)

这适用于简单的文档,但是当我在真实文档上运行它时,matchrange最终会在靠近找到匹配的某个点处,但并不完全正确。 它不是可预见的关闭,有时它在右边,有时在左边。一般文档越复杂。(链接、上下文表、格式等)结果越错误。

有谁知道为什么这不起作用,以及如何解决它? 如果我能理解为什么这不起作用,那么我也许能够确定这种方法是否可以修复,或者我是否只需要尝试不同的方法。

代码包含 DebugMode 参数,这意味着它只会遍历文档并突出显示所有匹配项,不执行任何更改。 还向控制台输出一堆东西。这应该对任何愿意和我一起解决这个问题的人都有帮助。

https://calibre-ebook.com/downloads/demos/demo.docx这是一个可能有用的示例文档(不是我制作的)。

cyb*_*mad 4

@Some_Guy:感谢您提出这个问题,我遇到了类似的问题,您的帖子节省了我很多时间。

这是我想出的拼凑:

Sub RegEx(Before As String, After As String, _
          Optional CaseSensitive As Boolean = False, _
          Optional Location As Range = Nothing, _
          Optional DebugMode As Boolean = False)

    'can't declare activedocument.range in parameters
    If Location Is Nothing Then Set Location = ActiveDocument.Range

    Dim j As Long
    Dim regexp As Object
    Dim Foundmatches As Object
    Dim Match As Object
    Dim MatchRange As Range
    Dim offset As Integer: offset = 0
    Set regexp = CreateObject("vbscript.regexp")

    With regexp
        .Pattern = Before
        .IgnoreCase = Not CaseSensitive
        .Global = True

        'set foundmatches to collection of all regex matches
        Set Foundmatches = .Execute(Location.Text)
        For j = Foundmatches.Count - 1 To 0 Step -1

            If DebugMode = True Then
                'debugging
                Debug.Print Foundmatches(j), .Replace(Foundmatches(j), After)
            Else
                'REAL LIFE

                'run a plain old find/replace on the found string and eplace strings
                With ActiveDocument.Range.Find
                    .ClearFormatting
                    .Replacement.ClearFormatting
                    .Replacement.Font.Hidden = True
                    .Text = Foundmatches(j)
                    .Replacement.Text = regexp.Replace(Foundmatches(j), After)
                    .Execute Replace:=wdReplaceAll
                End With
            End If
        Next j
    End With
End Sub
Run Code Online (Sandbox Code Playgroud)

基本上我使用一个简单的查找/替换,其中的字符串与找到的每个项目(并将被替换)与正则表达式相匹配,在Word中会对其提供适当的支持)。请注意,任何替换的文本都会采用第一个替换字符的格式,因此如果第一个单词为粗体,则所有替换的文本都将为粗体。