Excel VBA中的正则表达式

Mat*_*son 5 regex excel vba excel-vba

我在Excel VBA中使用Microsoft正则表达式引擎。我是regex的新手,但是我现在有一种工作模式。我需要扩展它,但是遇到了麻烦。到目前为止,这是我的代码:

Sub ImportFromDTD()

Dim sDTDFile As Variant
Dim ffile As Long
Dim sLines() As String
Dim i As Long
Dim Reg1 As RegExp
Dim M1 As MatchCollection
Dim M As Match
Dim myRange As Range

Set Reg1 = New RegExp

ffile = FreeFile

sDTDFile = Application.GetOpenFilename("DTD Files,*.XML", , _
"Browse for file to be imported")

If sDTDFile = False Then Exit Sub '(user cancelled import file browser)


Open sDTDFile For Input Access Read As #ffile
  Lines = Split(Input$(LOF(ffile), #ffile), vbNewLine)
Close #ffile

Cells(1, 2) = "From DTD"
J = 2

For i = 0 To UBound(Lines)

  'Debug.Print "Line"; i; "="; Lines(i)

  With Reg1
      '.Pattern = "(\<\!ELEMENT\s)(\w*)(\s*\(\#\w*\)\s*\>)"
      .Pattern = "(\<\!ELEMENT\s)(\w*)(\s*\(\#\w*\)\s*\>)"

      .Global = True
      .MultiLine = True
      .IgnoreCase = False
  End With

  If Reg1.Test(Lines(i)) Then
    Set M1 = Reg1.Execute(Lines(i))
    For Each M In M1
      sExtract = M.SubMatches(1)
      sExtract = Replace(sExtract, Chr(13), "")
      Cells(J, 2) = sExtract
      J = J + 1
      'Debug.Print sExtract
    Next M
  End If
Next i

Set Reg1 = Nothing

End Sub
Run Code Online (Sandbox Code Playgroud)

目前,我正在像这样匹配一组数据:

 <!ELEMENT DealNumber  (#PCDATA) >
Run Code Online (Sandbox Code Playgroud)

并提取Dealnumber,但是现在,我需要在数据上添加另一个匹配项,如下所示:

<!ELEMENT DealParties  (DealParty+) >
Run Code Online (Sandbox Code Playgroud)

并仅提取不包含Parens和+的Dealparty

我一直以此为参考,虽然很棒,但是我还是有些困惑。如何在单元格和循环中使用Microsoft Excel中的正则表达式(Regex)

编辑

我遇到了一些必须匹配的新场景。

 Extract Deal
 <!ELEMENT Deal  (DealNumber,DealType,DealParties) >

 Extract DealParty the ?,CR are throwing me off
 <!ELEMENT DealParty  (PartyType,CustomerID,CustomerName,CentralCustomerID?,
           LiabilityPercent,AgentInd,FacilityNo?,PartyReferenceNo?,
           PartyAddlReferenceNo?,PartyEffectiveDate?,FeeRate?,ChargeType?) >

 Extract Deals
 <!ELEMENT Deals  (Deal*) >
Run Code Online (Sandbox Code Playgroud)

bre*_*tdj 1

你可以使用这个Regex模式;

  .Pattern = "\<\!ELEMENT\s+(\w+)\s+\((#\w+|(\w+)\+)\)\s+\>"
Run Code Online (Sandbox Code Playgroud)
  1. 这部分

(#\w+|(\w+)\+)

匹配

#a-z0-9
a-z0-9+

括号内。

即匹配

(#PCDATA)
(DealParty+)

验证整个字符串

  1. 然后使用子匹配提取第一个有效匹配的DealNumber ,另一个有效匹配的DealParty

下面编辑了代码 - 注意现在是子匹配M.submatches(0)

    Sub ImportFromDTD()

Dim sDTDFile As Variant
Dim ffile As Long
Dim sLines() As String
Dim i As Long
Dim Reg1 As RegExp
Dim M1 As MatchCollection
Dim M As Match
Dim myRange As Range

Set Reg1 = New RegExp
J = 1

strIn = "<!ELEMENT Deal12Number  (#PCDATA) > <!ELEMENT DealParties  (DealParty+) >"

With Reg1
      .Pattern = "\<\!ELEMENT\s+(\w+)\s+\((#\w+|(\w+)\+)\)\s+\>"
      .Global = True
      .MultiLine = True
      .IgnoreCase = False
End With

If Reg1.Test(strIn) Then
    Set M1 = Reg1.Execute(strIn)
    For Each M In M1
      sExtract = M.SubMatches(2)
      If Len(sExtract) = 0 Then sExtract = M.SubMatches(0)
      sExtract = Replace(sExtract, Chr(13), "")
      Cells(J, 2) = sExtract
      J = J + 1
    Next M
End If

Set Reg1 = Nothing

End Sub
Run Code Online (Sandbox Code Playgroud)