使用VBA访问iframe中的对象

use*_*er1 8 html excel iframe vba web-scraping

要点:

我已成功使用VBA执行以下操作:

  • 使用getElementsByName登录网站

  • 选择将生成的报告的参数(使用getelementsby ...)

  • 选择参数后生成报告,这些参数将生成的数据集呈现在同一页面上的iframe中

需要注意的重要事项 - 该网站是客户端的

以上是简单的部分,困难的部分如下:

单击iframe中将数据集导出到csv的gif图像

我尝试过以下方法:

Dim idoc As HTMLDocument
Dim iframe As HTMLFrameElement
Dim iframe2 As HTMLDocument

Set idoc = objIE.document
Set iframe = idoc.all("iframename")
Set iframe2 = iframe.contentDocument

    Do Until InStr(1, objIE.document.all("iframename").contentDocument.innerHTML, "img.gif", vbTextCompare) = 0
        DoEvents
    Loop
Run Code Online (Sandbox Code Playgroud)

为上面的逻辑提供一些背景 -

  • 我访问了主框架
  • 我通过其名称元素访问iframe
  • 我访问了iframe中的内容
  • 我试图找到需要点击导出到csv的gif图像

正是在这条线上它说"对象不支持这个属性或方法"

还尝试通过a元素和href属性访问iframe gif,但这完全失败了.我也尝试从其源URL抓取图像,但所有这一切都将我带到图像所在的页面.

注意:iframe没有ID,奇怪的是gif图像没有"onclick"元素/事件

最后的考虑 - 尝试使用R来抓取iframe

访问iframe的HTML节点很简单,但是尝试访问iframe的属性,随后表的节点被证明是不成功的.它返回的只是"Character(0)"

library(rvest)
library(magrittr)

Blah <-read_html("web address redacted") %>%
  html_nodes("#iframe")%>%
  html_nodes("#img")%>%
  html_attr("#src")%>%
  #read_html()%>%
  head()
Blah
Run Code Online (Sandbox Code Playgroud)

只要ai包含read_html,脚本就会返回以下错误:

if(grepl("<|>",x)){:参数的长度为零时出错

我怀疑这是指字符(0)

感谢这里的任何指导!

非常感谢,

HTML

<div align="center"> 
    <table id="table1" style="border-collapse: collapse" width="700" cellspacing="0" cellpadding="0" border="0"> 
        <tbody>
            <tr>
                <td colspan="6"> &nbsp;</td>
            </tr> 
            <tr> 
                <td colspan="6"> 
                    <a href="href redacted">
                        <img src="img.gif" width="38" height="38" border="0" align="right">
                    </a>
                    <strong>x - </strong>
                </td>
            </tr> 
        </tbody>
    </table>
</div>
Run Code Online (Sandbox Code Playgroud)

dee*_*dee 7

它有时很棘手iframes.根据html您的提供,我创建了这个示例.哪个在本地工作,但它也适合你吗?

要到IFrameframes可以用来收藏.希望你知道nameIFrame

Dim iframeDoc As MSHTML.HTMLDocument
Set iframeDoc = doc.frames("iframename").document
Run Code Online (Sandbox Code Playgroud)

然后去image我们可以使用querySelector方法,例如:

Dim img As MSHTML.HTMLImg
Set img = iframeDoc.querySelector("div table[id='table1'] tbody tr td a[href^='https://stackoverflow.com'] img")
Run Code Online (Sandbox Code Playgroud)

选择器a[href^='https://stackoverflow.com']选择anchor具有href以给定文本开头的属性.^代表开始.

然后,当我们对图像进行简单的调用时click,它就是所需的父对象anchor.HTH


完整的例子:

Option Explicit

' Add reference to Microsoft Internet Controls (SHDocVw)
' Add reference to Microsoft HTML Object Library

Sub Demo()

    Dim ie As SHDocVw.InternetExplorer
    Dim doc As MSHTML.HTMLDocument
    Dim url As String

    url = "file:///C:/Users/dusek/Documents/My Web Sites/mainpage.html"
    Set ie = New SHDocVw.InternetExplorer
    ie.Visible = True
    ie.navigate url

    While ie.Busy Or ie.readyState <> READYSTATE_COMPLETE
        DoEvents
    Wend

    Set doc = ie.document

    Dim iframeDoc As MSHTML.HTMLDocument
    Set iframeDoc = doc.frames("iframename").document
    If iframeDoc Is Nothing Then
        MsgBox "IFrame with name 'iframename' was not found."
        ie.Quit
        Exit Sub
    End If

    Dim img As MSHTML.HTMLImg
    Set img = iframeDoc.querySelector("div table[id='table1'] tbody tr td a[href^='https://stackoverflow.com'] img")
    If img Is Nothing Then
        MsgBox "Image element within iframe was not found."
        ie.Quit
        Exit Sub
    Else
        img.parentElement.Click
    End If

    ie.Quit
End Sub
Run Code Online (Sandbox Code Playgroud)

使用主页HTML

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">

<head>
<!-- saved from url=(0016)http://localhost -->
<meta content="text/html; charset=utf-8" http-equiv="Content-Type" />
<title>x -</title>
</head>

<body>
<iframe name="iframename" src="iframe1.html">
</iframe>
</body>

</html>
Run Code Online (Sandbox Code Playgroud)

使用IFrame HTML(保存为文件iframe1.html)

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">

<head>
<!-- saved from url=(0016)http://localhost -->
<meta content="text/html; charset=utf-8" http-equiv="Content-Type" />
<title>Untitled 2</title>
</head>

<body>
<div align="center"> 
    <table id="table1" style="border-collapse: collapse" width="700" cellspacing="0" cellpadding="0" border="0"> 
        <tbody>
            <tr>
                <td colspan="6"> &nbsp;</td>
            </tr> 
            <tr> 
                <td colspan="6"> 
                    <a href="https://stackoverflow.com/questions/44902558/accessing-object-in-iframe-using-vba">
                        <img src="img.gif" width="38" height="38" border="0" align="right">
                    </a>
                    <strong>x - </strong>
                </td>
            </tr> 
        </tbody>
    </table>
</div>

</body>

</html>
Run Code Online (Sandbox Code Playgroud)