Dou*_* S. 3 xml excel performance vba
我有一个需要在VBA中解析的大型XML文件(excel 2003和2007).xml文件中可能存在超过11,000个"行"数据,每个"行"具有10到20个列之间的数据.这最终只是一个庞大的任务,只需解析并获取数据(5 - 7分钟).我尝试读取xml并将每个'row'放入字典(key =行号,值=行属性),但这需要同样长的时间.
遍历DOM需要永远.有更有效的方法吗?
Dim XMLDict
Sub ParseXML(ByRef RootNode As IXMLDOMNode)
Dim Counter As Long
Dim RowList As IXMLDOMNodeList
Dim ColumnList As IXMLDOMNodeList
Dim RowNode As IXMLDOMNode
Dim ColumnNode As IXMLDOMNode
Counter = 1
Set RowList = RootNode.SelectNodes("Row")
For Each RowNode In RowList
Set ColumnList = RowNode.SelectNodes("Col")
Dim NodeValues As String
For Each ColumnNode In ColumnList
NodeValues = NodeValues & "|" & ColumnNode.Attributes.getNamedItem("id").Text & ":" & ColumnNode.Text
Next ColumnNode
XMLDICT.Add Counter, NodeValues
Counter = Counter + 1
Next RowNode
End Sub
Run Code Online (Sandbox Code Playgroud)
您可以尝试使用SAX而不是DOM.当您所做的只是解析文档并且文档大小非常小时,SAX应该更快.这里是MSXML中SAX2实现的参考
我通常直接使用DOM来进行Excel中的大多数XML解析,但SAX似乎在某些情况下具有优势.这里的简短比较可能有助于解释它们之间的差异.
这是一个黑客共同的例子(部分基于此)仅Debug.Print用于输出:
通过工具>参考添加对"Microsoft XML,v6.0"的引用
在普通模块中添加此代码
Option Explicit
Sub main()
Dim saxReader As SAXXMLReader60
Dim saxhandler As ContentHandlerImpl
Set saxReader = New SAXXMLReader60
Set saxhandler = New ContentHandlerImpl
Set saxReader.contentHandler = saxhandler
saxReader.parseURL "file://C:\Users\foo\Desktop\bar.xml"
Set saxReader = Nothing
End Sub
Run Code Online (Sandbox Code Playgroud)
添加一个类模块,调用它ContentHandlerImpl并添加以下代码
Option Explicit
Implements IVBSAXContentHandler
Private lCounter As Long
Private sNodeValues As String
Private bGetChars As Boolean
Run Code Online (Sandbox Code Playgroud)
当时所用的模块顶部左侧下拉菜单选择"IVBSAXContentHandler",然后用右手下拉列表中又将新增存根每个事件(从characters到startPrefixMapping)
将代码添加到某些存根中,如下所示
明确设置计数器和标志,以显示我们是否要在此时读取文本数据
Private Sub IVBSAXContentHandler_startDocument()
lCounter = 0
bGetChars = False
End Sub
Run Code Online (Sandbox Code Playgroud)
每次启动新元素时,请检查元素的名称并采取适当的操作
Private Sub IVBSAXContentHandler_startElement(strNamespaceURI As String, strLocalName As String, strQName As String, ByVal oAttributes As MSXML2.IVBSAXAttributes)
Select Case strLocalName
Case "Row"
sNodeValues = ""
Case "Col"
sNodeValues = sNodeValues & "|" & oAttributes.getValueFromName(strNamespaceURI, "id") & ":"
bGetChars = True
Case Else
' do nothing
End Select
End Sub
Run Code Online (Sandbox Code Playgroud)
检查我们是否对文本数据感兴趣,如果是的话,请删除任何无关的空格并删除所有换行符(根据您要解析的文档,这可能是也可能不可取)
Private Sub IVBSAXContentHandler_characters(strChars As String)
If (bGetChars) Then
sNodeValues = sNodeValues & Replace(Trim$(strChars), vbLf, "")
End If
End Sub
Run Code Online (Sandbox Code Playgroud)
如果我们已经到了一个Col然后停止读取文本值; 如果我们到达a的末尾Row然后打印出节点值的字符串
Private Sub IVBSAXContentHandler_endElement(strNamespaceURI As String, strLocalName As String, strQName As String)
Select Case strLocalName
Case "Col"
bGetChars = False
Case "Row"
lCounter = lCounter + 1
Debug.Print lCounter & " " & sNodeValues
Case Else
' do nothing
End Select
End Sub
Run Code Online (Sandbox Code Playgroud)
为了使事情更清楚,这里有完整版本ContentHandlerImpl的存根方法:
Option Explicit
Implements IVBSAXContentHandler
Private lCounter As Long
Private sNodeValues As String
Private bGetChars As Boolean
Private Sub IVBSAXContentHandler_characters(strChars As String)
If (bGetChars) Then
sNodeValues = sNodeValues & Replace(Trim$(strChars), vbLf, "")
End If
End Sub
Private Property Set IVBSAXContentHandler_documentLocator(ByVal RHS As MSXML2.IVBSAXLocator)
End Property
Private Sub IVBSAXContentHandler_endDocument()
End Sub
Private Sub IVBSAXContentHandler_endElement(strNamespaceURI As String, strLocalName As String, strQName As String)
Select Case strLocalName
Case "Col"
bGetChars = False
Case "Row"
lCounter = lCounter + 1
Debug.Print lCounter & " " & sNodeValues
Case Else
' do nothing
End Select
End Sub
Private Sub IVBSAXContentHandler_endPrefixMapping(strPrefix As String)
End Sub
Private Sub IVBSAXContentHandler_ignorableWhitespace(strChars As String)
End Sub
Private Sub IVBSAXContentHandler_processingInstruction(strTarget As String, strData As String)
End Sub
Private Sub IVBSAXContentHandler_skippedEntity(strName As String)
End Sub
Private Sub IVBSAXContentHandler_startDocument()
lCounter = 0
bGetChars = False
End Sub
Private Sub IVBSAXContentHandler_startElement(strNamespaceURI As String, strLocalName As String, strQName As String, ByVal oAttributes As MSXML2.IVBSAXAttributes)
Select Case strLocalName
Case "Row"
sNodeValues = ""
Case "Col"
sNodeValues = sNodeValues & "|" & oAttributes.getValueFromName(strNamespaceURI, "id") & ":"
bGetChars = True
Case Else
' do nothing
End Select
End Sub
Private Sub IVBSAXContentHandler_startPrefixMapping(strPrefix As String, strURI As String)
End Sub
Run Code Online (Sandbox Code Playgroud)