我selenium用来点击我想要的网页,然后使用解析网页Beautiful Soup.
有人已经展示了如何获取元素的内部HTMLSelenium WebDriver.有没有办法获取整个页面的HTML?谢谢
示例代码Python
(基于上面的帖子,语言似乎并不重要):
from selenium import webdriver
from selenium.webdriver.support.ui import Select
from bs4 import BeautifulSoup
url = 'http://www.google.com'
driver = webdriver.Firefox()
driver.get(url)
the_html = driver---somehow----.get_attribute('innerHTML')
bs = BeautifulSoup(the_html, 'html.parser')
Run Code Online (Sandbox Code Playgroud)
Flo*_* B. 38
要获取整个页面的HTML:
from selenium import webdriver
driver = webdriver.Firefox()
driver.get("http://stackoverflow.com")
html = driver.page_source
Run Code Online (Sandbox Code Playgroud)
要获取外部HTML(包含标记):
# HTML from `<html>`
html = driver.execute_script("return document.documentElement.outerHTML;")
# HTML from `<body>`
html = driver.execute_script("return document.body.outerHTML;")
# HTML from element with some JavaScript
element = driver.find_element_by_css_selector("#hireme")
html = driver.execute_script("return arguments[0].outerHTML;", element)
# HTML from element with `get_attribute`
element = driver.find_element_by_css_selector("#hireme")
html = element.get_attribute('outerHTML')
Run Code Online (Sandbox Code Playgroud)
要获取内部HTML(标记除外):
# HTML from `<html>`
html = driver.execute_script("return document.documentElement.innerHTML;")
# HTML from `<body>`
html = driver.execute_script("return document.body.innerHTML;")
# HTML from element with some JavaScript
element = driver.find_element_by_css_selector("#hireme")
html = driver.execute_script("return arguments[0].innerHTML;", element)
# HTML from element with `get_attribute`
element = driver.find_element_by_css_selector("#hireme")
html = element.get_attribute('innerHTML')
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
33563 次 |
| 最近记录: |