Fel*_*xDB 1 html excel internet-explorer vba excel-vba
我在工作的任何计算机上都没有Internet Explorer,因此无法创建Internet Explorer的对象并使用ie.navigate解析html并搜索标签。我的问题是,如何在不使用IE的情况下自动将带有标签的某些数据从框架源拖动到电子表格?答案中的代码示例将非常有用:)谢谢
您可以使用XMLHTTP来检索网页的HTML源:
Function GetHTML(url As String) As String
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", url, False
.Send
GetHTML = .ResponseText
End With
End Function
Run Code Online (Sandbox Code Playgroud)
我不建议将其用作工作表函数,否则每次重新计算工作表时都会重新查询站点URL。一些站点具有适当的逻辑,可以通过频繁重复的呼叫来检测爬网,并且视站点而定,您的IP可能会被临时或永久禁止。
一旦有了源HTML字符串(最好将其存储在变量中,以避免不必要的重复调用),就可以使用基本的文本函数来解析字符串以搜索标签。
这个基本函数将返回的值<tag>和</tag>:
Public Function getTag(url As String, tag As String, Optional occurNum As Integer) As String
Dim html As String, pStart As Long, pEnd As Long, o As Integer
html = GetHTML(url)
'remove <> if they exist so we can add our own
If Left(tag, 1) = "<" And Right(tag, 1) = ">" Then
tag = Left(Right(tag, Len(tag) - 1), Len(Right(tag, Len(tag) - 1)) - 1)
End If
' default to Occurrence #1
If occurNum = 0 Then occurNum = 1
pEnd = 1
For o = 1 To occurNum
' find start <tag> beginning at 1 (or after previous Occurence)
pStart = InStr(pEnd, html, "<" & tag & ">", vbTextCompare)
If pStart = 0 Then
getTag = "{Not Found}"
Exit Function
End If
pStart = pStart + Len("<" & tag & ">")
' find first end </tag> after start <tag>
pEnd = InStr(pStart, html, "</" & tag & ">", vbTextCompare)
Next o
'return string between start <tag> & end </tag>
getTag = Mid(html, pStart, pEnd - pStart)
End Function
Run Code Online (Sandbox Code Playgroud)
这只会找到基本<tag>的,但是您可以添加/删除/更改文本功能以适合您的需求。
Sub findTagExample()
Const testURL = "https://en.wikipedia.org/wiki/Web_scraping"
'search for 2nd occurence of tag: <h2> which is "Contents" :
Debug.Print getTag(testURL, "<h2>", 2)
'...this returns the 8th occurence, "Navigation Menu" :
Debug.Print getTag(testURL, "<h2>", 8)
'...and this returns an HTML <span> containing a title for the 'Legal Issues' section:
Debug.Print getTag("https://en.wikipedia.org/wiki/Web_scraping", "<h2>", 4)
End Sub
Run Code Online (Sandbox Code Playgroud)