Selenium 中 iFrame 的问题

Question

Selenium 中 iFrame 的问题

我正在尝试使用几乎完全是 Javascript 的 Selenium（在 Python 中）抓取网页。
例如，这是页面的正文：

<body class="bodyLoading">
<!-- this is required for GWT history support -->
<iframe id="__gwt_historyFrame" role="presentation" width="0" height="0" tabindex="-1" title="empty" style="position:absolute;width:0;height:0;border:0" src="javascript:''">  </iframe>
<!-- For printing window contents  -->
<iframe id="__printingFrame" role="presentation" width="0" height="0" tabindex="-1" title="empty" style="width:0;height:0;border:0;"   />


<!-- TODO : RECOMMENDED if your web app will not function without JavaScript enabled -->
<noscript>
<div style="width: 22em; position: absolute; left: 50%; margin-left: -11em; color: red; background-color: white; border: 1px solid red; padding: 4px; font-family: sans-serif">
 Your web browser must have JavaScript enabled in order for
 Regulations.gov to display correctly.
</div>
</noscript>
</body>

Run Code Online (Sandbox Code Playgroud)

出于某种原因，selenium（使用 Firefox 引擎）不会评估此页面上的 javascript。如果我使用该get_html_source函数，它只会返回上面的 html，而不是我可以在浏览器（和 Selenium 浏览器）中看到的 JavaScript 导入的 HTML。而且，不幸的是，我无法弄清楚srciFrame的属性只是说javascript:我无法弄清楚。

关于如何确保 Selenium 处理此 iFrame 的任何想法？

Answer 1

Spu*_*ley 5

iframe 是单独的文档，因此您不会将它们的内容包含在主页的 HTML 代码中；你必须分开阅读它们。

您可以使用 Selenium 的select_frame功能来做到这一点。

与其他元素一样，您可以通过其名称、CSS 选择器、xpath 引用等访问框架。

当您选择框架时，您会更改 Selenium 的上下文，因此您可以像访问当前页面一样访问框架的内容。

如果帧中有帧，则可以通过帧树继续此过程。

显然，您需要一种返回帧路径的方法。Selenium 通过允许您使用相同的select_frame函数来提供此功能，其参数relative=up为将上下文relative=top移动到当前帧的父级，或移动到浏览器中的主页面。

因此，使用此功能您可以浏览页面中的框架。

你不能一次访问它们；一次只能有一个框架在上下文中，因此您永远无法进行一次get_html_source调用并立即获取所有框架的内容，但是您可以在 Selenium 脚本中的页面中的框架中导航并获取 HTML 源代码分别为每一帧。

希望有帮助。

归档时间：	14 年，8 月前
查看次数：	2589 次
最近记录：	13 年，6 月前