Sob*_*ang 4 excel vba excel-vba getelementsbyclassname
我用vba抓取一些网站以获得乐趣,我使用VBA作为工具.我使用XMLHTTP和HTMLDocument(因为它比internetExplorer.Application更快).
Public Sub XMLhtmlDocumentHTMLSourceScraper()
Dim XMLHTTPReq As Object
Dim htmlDoc As HTMLDocument
Dim postURL As String
postURL = "http://foodffs.tumblr.com/archive/2015/11"
Set XMLHTTPReq = New MSXML2.XMLHTTP
With XMLHTTPReq
.Open "GET", postURL, False
.Send
End With
Set htmlDoc = New HTMLDocument
With htmlDoc
.body.innerHTML = XMLHTTPReq.responseText
End With
i = 0
Set varTemp = htmlDoc.getElementsByClassName("post_glass post_micro_glass")
For Each vr In varTemp
''''the next line is important to solve this issue *1
Cells(1, 1) = vr.outerHTML
Set varTemp2 = vr.getElementsByTagName("SPAN class=post_date")
Cells(i + 1, 3) = varTemp2.Item(0).innerText
''''the next line occur 438Error''''
Set varTemp2 = vr.getElementsByClassName("hover_inner")
Cells(i + 1, 4) = varTemp2.innerText
i = i + 1
Next vr
End Sub
Run Code Online (Sandbox Code Playgroud)
我通过*1单元格(1,1)向我展示了下一个问题
<DIV class="post_glass post_micro_glass" title=""><A class=hover title="" href="http://foodffs.tumblr.com/post/134291668251/sugar-free-low-carb-coffee-ricotta-mousse-really" target=_blank>
<DIV class=hover_inner><SPAN class=post_date>...............
Run Code Online (Sandbox Code Playgroud)
是的所有班级标签丢失了"".只有第一个函数的类有""我真的不知道为什么会出现这种情况.
//我可以通过getElementsByTagName("span")进行解析.但我更喜欢"上课"标签.....
小智 5
该getElementsByClassName方法方法不被认为是其自身的方法; 仅限父HTMLDocument.如果要使用它来定位DIV元素中的元素,则需要创建由该特定DIV元素的.outerHtml组成的子HTMLDocument.
Public Sub XMLhtmlDocumentHTMLSourceScraper()
Dim xmlHTTPReq As New MSXML2.XMLHTTP
Dim htmlDOC As New HTMLDocument, divSUBDOC As New HTMLDocument
Dim iDIV As Long, iSPN As Long, iEL As Long
Dim postURL As String, nr As Long, i As Long
postURL = "http://foodffs.tumblr.com/archive/2015/11"
With xmlHTTPReq
.Open "GET", postURL, False
.Send
End With
'Set htmlDOC = New HTMLDocument
With htmlDOC
.body.innerHTML = xmlHTTPReq.responseText
End With
i = 0
With htmlDOC
For iDIV = 0 To .getElementsByClassName("post_glass post_micro_glass").Length - 1
nr = Sheet1.Cells(Rows.Count, 3).End(xlUp).Offset(1, 0).Row
With .getElementsByClassName("post_glass post_micro_glass")(iDIV)
'method 1 - run through multiples in a collection
For iSPN = 0 To .getElementsByTagName("span").Length - 1
With .getElementsByTagName("span")(iSPN)
Select Case LCase(.className)
Case "post_date"
Cells(nr, 3) = .innerText
Case "post_notes"
Cells(nr, 4) = .innerText
Case Else
'do nothing
End Select
End With
Next iSPN
'method 2 - create a sub-HTML doc to facilitate getting els by classname
divSUBDOC.body.innerHTML = .outerHTML 'only the HTML from this DIV
With divSUBDOC
If CBool(.getElementsByClassName("hover_inner").Length) Then 'there is at least 1
'use the first
Cells(nr, 5) = .getElementsByClassName("hover_inner")(0).innerText
End If
End With
End With
Next iDIV
End With
End Sub
Run Code Online (Sandbox Code Playgroud)
虽然其他.getElementsByXXXX可以很容易地检索另一个元素中的集合,但getElementsByClassName方法需要考虑它认为整个HTMLDocument的内容,即使你已经愚弄了它.
| 归档时间: |
|
| 查看次数: |
3938 次 |
| 最近记录: |