我一直在玩使用VBS/VBA从网页上抓取数据.
如果它是Javascript我会很容易离开,但它在VBS/VBA中似乎并不那么直接.
这是我为答案做的一个例子,它可以工作,但我已经计划使用它来访问子节点,getElementByTagName但我无法弄清楚如何使用它们!该HTMLElement对象没有这些方法.
Sub Scrape()
Dim Browser As InternetExplorer
Dim Document As HTMLDocument
Dim Elements As IHTMLElementCollection
Dim Element As IHTMLElement
Set Browser = New InternetExplorer
Browser.navigate "http://www.hsbc.com/about-hsbc/leadership"
Do While Browser.Busy And Not Browser.readyState = READYSTATE_COMPLETE
DoEvents
Loop
Set Document = Browser.Document
Set Elements = Document.getElementsByClassName("profile-col1")
For Each Element in Elements
Debug.Print "[ name] " & Trim(Element.Children(1).Children(0).innerText)
Debug.Print "[ title] " & Trim(Element.Children(1).Children(1).innerText)
Next Element
Set Document = Nothing
Set Browser = Nothing
End Sub
Run Code Online (Sandbox Code Playgroud)
我一直在看这个HTMLElement.document属性,看它是否像文档的一个片段,但要么难以使用,要么就是我认为的不是
Dim Fragment As HTMLDocument
Set Element = Document.getElementById("example") ' This works
Set Fragment = Element.document ' This doesn't
Run Code Online (Sandbox Code Playgroud)
这似乎是一个漫长的方式(虽然这通常是vba imo的方式).任何人都知道是否有更简单的链式函数方法?
Document.getElementById("target").getElementsByTagName("tr") 会很棒......
dee*_*dee 12
Sub Scrape()
Dim Browser As InternetExplorer
Dim Document As htmlDocument
Dim Elements As IHTMLElementCollection
Dim Element As IHTMLElement
Set Browser = New InternetExplorer
Browser.Visible = True
Browser.navigate "http://www.stackoverflow.com"
Do While Browser.Busy And Not Browser.readyState = READYSTATE_COMPLETE
DoEvents
Loop
Set Document = Browser.Document
Set Elements = Document.getElementById("hmenus").getElementsByTagName("li")
For Each Element In Elements
Debug.Print Element.innerText
'Questions
'Tags
'Users
'Badges
'Unanswered
'Ask Question
Next Element
Set Document = Nothing
Set Browser = Nothing
End Sub
Run Code Online (Sandbox Code Playgroud)
我也不喜欢它。
所以使用JavaScript:
Public Function GetJavaScriptResult(doc as HTMLDocument, jsString As String) As String
Dim el As IHTMLElement
Dim nd As HTMLDOMTextNode
Set el = doc.createElement("INPUT")
Do
el.ID = GenerateRandomAlphaString(100)
Loop Until Document.getElementById(el.ID) Is Nothing
el.Style.display = "none"
Set nd = Document.appendChild(el)
doc.parentWindow.ExecScript "document.getElementById('" & el.ID & "').value = " & jsString
GetJavaScriptResult = Document.getElementById(el.ID).Value
Document.removeChild nd
End Function
Function GenerateRandomAlphaString(Length As Long) As String
Dim i As Long
Dim Result As String
Randomize Timer
For i = 1 To Length
Result = Result & Chr(Int(Rnd(Timer) * 26 + 65 + Round(Rnd(Timer)) * 32))
Next i
GenerateRandomAlphaString = Result
End Function
Run Code Online (Sandbox Code Playgroud)
如果您对此有任何问题,请告诉我;我已将上下文从方法更改为函数。
顺便问一下,你用的是什么版本的IE?我怀疑你使用的是 < IE8。如果您升级到 IE8,我想它会将 shdocvw.dll 更新为 ieframe.dll,并且您将能够使用 document.querySelector/All。
编辑
评论响应实际上并不是评论:基本上,在 VBA 中执行此操作的方法是遍历子节点。问题是你没有得到正确的返回类型。您可以通过创建自己的类(分别)实现 IHTMLElement 和 IHTMLElementCollection 来解决此问题;但这对我来说太痛苦了,无法在没有报酬的情况下做到这一点:)。如果您有决心,请去阅读 VB6/VBA 的 Implements 关键字。
Public Function getSubElementsByTagName(el As IHTMLElement, tagname As String) As Collection
Dim descendants As New Collection
Dim results As New Collection
Dim i As Long
getDescendants el, descendants
For i = 1 To descendants.Count
If descendants(i).tagname = tagname Then
results.Add descendants(i)
End If
Next i
getSubElementsByTagName = results
End Function
Public Function getDescendants(nd As IHTMLElement, ByRef descendants As Collection)
Dim i As Long
descendants.Add nd
For i = 1 To nd.Children.Length
getDescendants nd.Children.Item(i), descendants
Next i
End Function
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
114451 次 |
| 最近记录: |