如何提高VBA中XML解析的速度

Dou*_* S. 3 xml excel performance vba

我有一个需要在VBA中解析的大型XML文件(excel 2003和2007).xml文件中可能存在超过11,000个"行"数据,每个"行"具有10到20个列之间的数据.这最终只是一个庞大的任务,只需解析并获取数据(5 - 7分钟).我尝试读取xml并将每个'row'放入字典(key =行号,值=行属性),但这需要同样长的时间.

遍历DOM需要永远.有更有效的方法吗?

Dim XMLDict
    Sub ParseXML(ByRef RootNode As IXMLDOMNode)
        Dim Counter As Long
        Dim RowList As IXMLDOMNodeList
        Dim ColumnList As IXMLDOMNodeList
        Dim RowNode As IXMLDOMNode
        Dim ColumnNode As IXMLDOMNode
        Counter = 1
        Set RowList = RootNode.SelectNodes("Row")

        For Each RowNode In RowList
            Set ColumnList = RowNode.SelectNodes("Col")
            Dim NodeValues As String
            For Each ColumnNode In ColumnList
                NodeValues = NodeValues & "|" & ColumnNode.Attributes.getNamedItem("id").Text & ":" & ColumnNode.Text
            Next ColumnNode
            XMLDICT.Add Counter, NodeValues
            Counter = Counter + 1
        Next RowNode
    End Sub
Run Code Online (Sandbox Code Playgroud)

bar*_*owc 6

您可以尝试使用SAX而不是DOM.当您所做的只是解析文档并且文档大小非常小时,SAX应该更快.这里是MSXML中SAX2实现的参考

我通常直接使用DOM来进行Excel中的大多数XML解析,但SAX似乎在某些情况下具有优势.这里的简短比较可能有助于解释它们之间的差异.

这是一个黑客共同的例子(部分基于)仅Debug.Print用于输出:

通过工具>参考添加对"Microsoft XML,v6.0"的引用

在普通模块中添加此代码

Option Explicit

Sub main()

Dim saxReader As SAXXMLReader60
Dim saxhandler As ContentHandlerImpl

Set saxReader = New SAXXMLReader60
Set saxhandler = New ContentHandlerImpl

Set saxReader.contentHandler = saxhandler
saxReader.parseURL "file://C:\Users\foo\Desktop\bar.xml"

Set saxReader = Nothing

End Sub
Run Code Online (Sandbox Code Playgroud)

添加一个类模块,调用它ContentHandlerImpl并添加以下代码

Option Explicit

Implements IVBSAXContentHandler

Private lCounter As Long
Private sNodeValues As String
Private bGetChars As Boolean
Run Code Online (Sandbox Code Playgroud)

当时所用的模块顶部左侧下拉菜单选择"IVBSAXContentHandler",然后用右手下拉列表中又将新增存根每个事件(从charactersstartPrefixMapping)

将代码添加到某些存根中,如下所示

明确设置计数器和标志,以显示我们是否要在此时读取文本数据

Private Sub IVBSAXContentHandler_startDocument()

lCounter = 0
bGetChars = False

End Sub
Run Code Online (Sandbox Code Playgroud)

每次启动新元素时,请检查元素的名称并采取适当的操作

Private Sub IVBSAXContentHandler_startElement(strNamespaceURI As String, strLocalName As String, strQName As String, ByVal oAttributes As MSXML2.IVBSAXAttributes)

Select Case strLocalName
    Case "Row"
        sNodeValues = ""
    Case "Col"
        sNodeValues = sNodeValues & "|" & oAttributes.getValueFromName(strNamespaceURI, "id") & ":"
        bGetChars = True
    Case Else
        ' do nothing
End Select

End Sub
Run Code Online (Sandbox Code Playgroud)

检查我们是否对文本数据感兴趣,如果是的话,请删除任何无关的空格并删除所有换行符(根据您要解析的文档,这可能是也可能不可取)

Private Sub IVBSAXContentHandler_characters(strChars As String)

If (bGetChars) Then
    sNodeValues = sNodeValues & Replace(Trim$(strChars), vbLf, "")
End If

End Sub
Run Code Online (Sandbox Code Playgroud)

如果我们已经到了一个Col然后停止读取文本值; 如果我们到达a的末尾Row然后打印出节点值的字符串

Private Sub IVBSAXContentHandler_endElement(strNamespaceURI As String, strLocalName As String, strQName As String)

Select Case strLocalName
    Case "Col"
        bGetChars = False
    Case "Row"
        lCounter = lCounter + 1
        Debug.Print lCounter & " " & sNodeValues
    Case Else
        ' do nothing
End Select

End Sub
Run Code Online (Sandbox Code Playgroud)

为了使事情更清楚,这里有完整版本ContentHandlerImpl的存根方法:

Option Explicit

Implements IVBSAXContentHandler

Private lCounter As Long
Private sNodeValues As String
Private bGetChars As Boolean

Private Sub IVBSAXContentHandler_characters(strChars As String)

If (bGetChars) Then
    sNodeValues = sNodeValues & Replace(Trim$(strChars), vbLf, "")
End If

End Sub

Private Property Set IVBSAXContentHandler_documentLocator(ByVal RHS As MSXML2.IVBSAXLocator)

End Property

Private Sub IVBSAXContentHandler_endDocument()

End Sub

Private Sub IVBSAXContentHandler_endElement(strNamespaceURI As String, strLocalName As String, strQName As String)

Select Case strLocalName
    Case "Col"
        bGetChars = False
    Case "Row"
        lCounter = lCounter + 1
        Debug.Print lCounter & " " & sNodeValues
    Case Else
        ' do nothing
End Select

End Sub

Private Sub IVBSAXContentHandler_endPrefixMapping(strPrefix As String)

End Sub

Private Sub IVBSAXContentHandler_ignorableWhitespace(strChars As String)

End Sub

Private Sub IVBSAXContentHandler_processingInstruction(strTarget As String, strData As String)

End Sub

Private Sub IVBSAXContentHandler_skippedEntity(strName As String)

End Sub

Private Sub IVBSAXContentHandler_startDocument()

lCounter = 0
bGetChars = False

End Sub

Private Sub IVBSAXContentHandler_startElement(strNamespaceURI As String, strLocalName As String, strQName As String, ByVal oAttributes As MSXML2.IVBSAXAttributes)

Select Case strLocalName
    Case "Row"
        sNodeValues = ""
    Case "Col"
        sNodeValues = sNodeValues & "|" & oAttributes.getValueFromName(strNamespaceURI, "id") & ":"
        bGetChars = True
    Case Else
        ' do nothing
End Select

End Sub

Private Sub IVBSAXContentHandler_startPrefixMapping(strPrefix As String, strURI As String)

End Sub
Run Code Online (Sandbox Code Playgroud)