一切都完成后,我的刮刀抛出错误而不是退出浏览器

SIM*_*SIM 11 excel vba web-scraping internet-explorer-11 queryselector

我在vba中编写了一个刮刀来解析一些来自torrent网站的电影信息.我使用IEqueryselector完成任务.当我执行我的代码时,它会解析所有内容以及弹出错误.似乎错误出现了,而不是继续.如果我取消错误框,那么我可以看到结果.我在下面上传了两张图片,向您展示我遇到的错误.如何在没有任何错误的情况下成功执行代码?提前致谢.

这是完整的代码:

Sub Torrent_Data()
    Dim IE As New InternetExplorer, html As HTMLDocument
    Dim post As Object

    With IE
        .Visible = False
        .navigate "https://yts.am/browse-movies"
        Do While .readyState <> READYSTATE_COMPLETE: Loop
        Set html = .Document
    End With

    For Each post In html.querySelectorAll(".browse-movie-bottom")
        Row = Row + 1: Cells(Row, 1) = post.queryselector(".browse-movie-title").innerText
        Cells(Row, 2) = post.queryselector(".browse-movie-year").innerText
    Next post
    IE.Quit
End Sub
Run Code Online (Sandbox Code Playgroud)

我遇到的错误:

第一个错误

第二个错误

这两个错误同时出现.我正在使用Internet Explorer 11.

另一方面,如果我尝试如下,它会成功地带来结果,没有任何问题.

Sub Torrent_Data()
    Dim IE As New InternetExplorer, html As HTMLDocument
    Dim post As Object

    With IE
        .Visible = False
        .navigate "https://yts.am/browse-movies"
        Do While .readyState <> READYSTATE_COMPLETE: Loop
        Set html = .Document
    End With

    For Each post In html.getElementsByClassName("browse-movie-bottom")
        Row = Row + 1: Cells(Row, 1) = post.queryselector(".browse-movie-title").innerText
        Cells(Row, 2) = post.queryselector(".browse-movie-year").innerText
    Next post
    IE.Quit
End Sub
Run Code Online (Sandbox Code Playgroud)

参考我已添加到库中:

Sub Torrent_Data()
    Dim IE As New InternetExplorer, html As HTMLDocument
    Dim post As Object

    With IE
        .Visible = False
        .navigate "https://yts.am/browse-movies"
        Do While .readyState <> READYSTATE_COMPLETE: Loop
        Set html = .Document
    End With

    For Each post In html.querySelectorAll(".browse-movie-bottom")
        Row = Row + 1: Cells(Row, 1) = post.queryselector(".browse-movie-title").innerText
        Cells(Row, 2) = post.queryselector(".browse-movie-year").innerText
    Next post
    IE.Quit
End Sub
Run Code Online (Sandbox Code Playgroud)

那么,这里有什么问题,IE或者我在这里错过了什么来成功?是否有任何参考添加到库以摆脱错误?

S M*_*den 5

好的,所以对该网页有一些严重不友好的事情.它对我来说一直在崩溃.所以我已经在脚本引擎/脚本控制中运行了一个javascript程序,它可以运行.

我希望你能遵循它.逻辑是在添加到ScriptEngine的javascript中.我得到两个节点列表,一个电影列表和一个年份列表; 然后我逐步遍历每个数组,并将它们作为键值对添加到Microsoft Scripting Dictionary.

Option Explicit

'*Tools->References
'*    Microsoft Scripting Runtime
'*    Microsoft Scripting Control
'*    Microsoft Internet Controls
'*    Microsoft HTML Object Library

Sub Torrent_Data()
    Dim row As Long
    Dim IE As New InternetExplorer, html As HTMLDocument
    Dim post As Object

    With IE
        .Visible = True
        .navigate "https://yts.am/browse-movies"
        Do While .readyState <> READYSTATE_COMPLETE:
            DoEvents
        Loop
        Set html = .document
    End With

    Dim dicFilms As Scripting.Dictionary
    Set dicFilms = New Scripting.Dictionary

    Call GetScriptEngine.Run("getMovies", html, dicFilms)

    Dim vFilms As Variant
    vFilms = dicFilms.Keys

    Dim vYears As Variant
    vYears = dicFilms.Items

    Dim lRowLoop As Long
    For lRowLoop = 0 To dicFilms.Count - 1

        Cells(lRowLoop + 1, 1) = vFilms(lRowLoop)
        Cells(lRowLoop + 1, 2) = vYears(lRowLoop)

    Next lRowLoop

    Stop

    IE.Quit
End Sub

Private Function GetScriptEngine() As ScriptControl
    '* see code from this SO Q & A
    ' https://stackoverflow.com/questions/37711073/in-excel-vba-on-windows-how-to-get-stringified-json-respresentation-instead-of
    Static soScriptEngine As ScriptControl
    If soScriptEngine Is Nothing Then
        Set soScriptEngine = New ScriptControl
        soScriptEngine.Language = "JScript"

        soScriptEngine.AddCode "function getMovies(htmlDocument, microsoftDict) { " & _
                                    "var titles = htmlDocument.querySelectorAll('a.browse-movie-title'), i;" & _
                                    "var years = htmlDocument.querySelectorAll('div.browse-movie-year'), j;" & _
                                    "if ( years.length === years.length) {" & _
                                    "for (i=0; i< years.length; ++i) {" & _
                                    "   var film = titles[i].innerText;" & _
                                    "   var year = years[i].innerText;" & _
                                    "   microsoftDict.Add(film, year);" & _
                                    "}}}"

    End If
    Set GetScriptEngine = soScriptEngine
End Function
Run Code Online (Sandbox Code Playgroud)