如何在selenium驱动程序中获取整页的innerHTML?

YJZ*_*YJZ 16 selenium

selenium用来点击我想要的网页,然后使用解析网页Beautiful Soup.

有人已经展示了如何获取元素的内部HTMLSelenium WebDriver.有没有办法获取整个页面的HTML?谢谢

示例代码Python (基于上面的帖子,语言似乎并不重要):

from selenium import webdriver
from selenium.webdriver.support.ui import Select
from bs4 import BeautifulSoup


url = 'http://www.google.com'
driver = webdriver.Firefox()
driver.get(url)

the_html = driver---somehow----.get_attribute('innerHTML')
bs = BeautifulSoup(the_html, 'html.parser')
Run Code Online (Sandbox Code Playgroud)

Flo*_* B. 38

要获取整个页面的HTML:

from selenium import webdriver

driver = webdriver.Firefox()
driver.get("http://stackoverflow.com")

html = driver.page_source
Run Code Online (Sandbox Code Playgroud)

要获取外部HTML(包含标记):

# HTML from `<html>`
html = driver.execute_script("return document.documentElement.outerHTML;")

# HTML from `<body>`
html = driver.execute_script("return document.body.outerHTML;")

# HTML from element with some JavaScript
element = driver.find_element_by_css_selector("#hireme")
html = driver.execute_script("return arguments[0].outerHTML;", element)

# HTML from element with `get_attribute`
element = driver.find_element_by_css_selector("#hireme")
html = element.get_attribute('outerHTML')
Run Code Online (Sandbox Code Playgroud)

要获取内部HTML(标记除外):

# HTML from `<html>`
html = driver.execute_script("return document.documentElement.innerHTML;")

# HTML from `<body>`
html = driver.execute_script("return document.body.innerHTML;")

# HTML from element with some JavaScript
element = driver.find_element_by_css_selector("#hireme")
html = driver.execute_script("return arguments[0].innerHTML;", element)

# HTML from element with `get_attribute`
element = driver.find_element_by_css_selector("#hireme")
html = element.get_attribute('innerHTML')
Run Code Online (Sandbox Code Playgroud)

  • 谢谢@florentbr。对于 OP 中提到的帖子中的元素,似乎有一个更简单的答案,`element.get_attribute('innerHTML')` ----您对同一件事的回答是否相同,或者哪个更强大/灵活? (3认同)