使用Selenium Python和chromedriver截取整页的屏幕截图

ihi*_*wer 27 python selenium webpage-screenshot selenium-chromedriver

尝试了各种方法之后......我偶然发现这个页面采用了chromedriver,selenium和python的全页截图.

原始代码在这里:http://seleniumpythonqa.blogspot.com/2015/08/generate-full-page-screenshot-in-chrome.html(我在下面的帖子中复制代码)

它使用PIL,效果很棒!!!!! 然而,有一个问题......它捕获整个页面的固定标题和重复,并且在页面更改期间也错过了页面的某些部分.示例网址截取屏幕截图:

http://www.w3schools.com/js/default.asp

如何避免使用此代码重复标头...或者是否有更好的选项只使用python ... (我不知道java,不想使用java).

请参阅下面的当前结果和示例代码的屏幕截图.

带有重复标题的整页截图

test.py

"""
This script uses a simplified version of the one here:
https://snipt.net/restrada/python-selenium-workaround-for-full-page-screenshot-using-chromedriver-2x/

It contains the *crucial* correction added in the comments by Jason Coutu.
"""

import sys

from selenium import webdriver
import unittest

import util

class Test(unittest.TestCase):
    """ Demonstration: Get Chrome to generate fullscreen screenshot """

    def setUp(self):
        self.driver = webdriver.Chrome()

    def tearDown(self):
        self.driver.quit()

    def test_fullpage_screenshot(self):
        ''' Generate document-height screenshot '''
        #url = "http://effbot.org/imagingbook/introduction.htm"
        url = "http://www.w3schools.com/js/default.asp"
        self.driver.get(url)
        util.fullpage_screenshot(self.driver, "test.png")


if __name__ == "__main__":
    unittest.main(argv=[sys.argv[0]])
Run Code Online (Sandbox Code Playgroud)

util.py

import os
import time

from PIL import Image

def fullpage_screenshot(driver, file):

        print("Starting chrome full page screenshot workaround ...")

        total_width = driver.execute_script("return document.body.offsetWidth")
        total_height = driver.execute_script("return document.body.parentNode.scrollHeight")
        viewport_width = driver.execute_script("return document.body.clientWidth")
        viewport_height = driver.execute_script("return window.innerHeight")
        print("Total: ({0}, {1}), Viewport: ({2},{3})".format(total_width, total_height,viewport_width,viewport_height))
        rectangles = []

        i = 0
        while i < total_height:
            ii = 0
            top_height = i + viewport_height

            if top_height > total_height:
                top_height = total_height

            while ii < total_width:
                top_width = ii + viewport_width

                if top_width > total_width:
                    top_width = total_width

                print("Appending rectangle ({0},{1},{2},{3})".format(ii, i, top_width, top_height))
                rectangles.append((ii, i, top_width,top_height))

                ii = ii + viewport_width

            i = i + viewport_height

        stitched_image = Image.new('RGB', (total_width, total_height))
        previous = None
        part = 0

        for rectangle in rectangles:
            if not previous is None:
                driver.execute_script("window.scrollTo({0}, {1})".format(rectangle[0], rectangle[1]))
                print("Scrolled To ({0},{1})".format(rectangle[0], rectangle[1]))
                time.sleep(0.2)

            file_name = "part_{0}.png".format(part)
            print("Capturing {0} ...".format(file_name))

            driver.get_screenshot_as_file(file_name)
            screenshot = Image.open(file_name)

            if rectangle[1] + viewport_height > total_height:
                offset = (rectangle[0], total_height - viewport_height)
            else:
                offset = (rectangle[0], rectangle[1])

            print("Adding to stitched image with offset ({0}, {1})".format(offset[0],offset[1]))
            stitched_image.paste(screenshot, offset)

            del screenshot
            os.remove(file_name)
            part = part + 1
            previous = rectangle

        stitched_image.save(file)
        print("Finishing chrome full page screenshot workaround...")
        return True
Run Code Online (Sandbox Code Playgroud)

小智 24

工作原理:尽可能将浏览器高度设置为最长...

#coding=utf-8
import time
from selenium import webdriver
from selenium.webdriver.chrome.options import Options

def test_fullpage_screenshot(self):
    chrome_options = Options()
    chrome_options.add_argument('--headless')
    chrome_options.add_argument('--start-maximized')
    driver = webdriver.Chrome(chrome_options=chrome_options)
    driver.get("yoururlxxx")
    time.sleep(2)

    #the element with longest height on page
    ele=driver.find_element("xpath", '//div[@class="react-grid-layout layout"]')
    total_height = ele.size["height"]+1000

    driver.set_window_size(1920, total_height)      #the trick
    time.sleep(2)
    driver.save_screenshot("screenshot1.png")
    driver.quit()

if __name__ == "__main__":
    test_fullpage_screenshot()
Run Code Online (Sandbox Code Playgroud)

  • 正如下面其他人指出的,如果您使用“headless”运行,这仅适用于整页 (2认同)

小智 16

element = driver.find_element_by_tag_name('body')
element_png = element.screenshot_as_png
with open("test2.png", "wb") as file:
    file.write(element_png)
Run Code Online (Sandbox Code Playgroud)

这适合我.它将整个页面保存为屏幕截图.有关更多信息,请阅读api文档:http: //selenium-python.readthedocs.io/api.html

  • 这项技术仅对我有用,但对另一页却无效。我也等待页面完全加载。我有一个[**较新的答案**](/sf/answers/3680104361/),它基于此答案并且工作更可靠。 (2认同)

Acu*_*nus 14

这个答案改进了am05mhzJaved Karim的先前答案.

它假定无头模式,并且最初没有设置窗口大小选项.在调用此函数之前,请确保页面已完全或足够加载.

它试图将宽度和高度都设置为必要的.整个页面的屏幕截图有时可能包含不必要的垂直滚动条.通常避免使用滚动条的一种方法是取一个body元素的屏幕截图.保存屏幕截图后,它会将大小恢复为原来的大小,否则可能无法正确设置下一屏幕截图的大小.

最终,对于某些示例,这种技术可能仍然不能很好地工作.

def save_screenshot(driver: webdriver.Chrome, path: str = '/tmp/screenshot.png') -> None:
    # Ref: https://stackoverflow.com/a/52572919/
    original_size = driver.get_window_size()
    required_width = driver.execute_script('return document.body.parentNode.scrollWidth')
    required_height = driver.execute_script('return document.body.parentNode.scrollHeight')
    driver.set_window_size(required_width, required_height)
    # driver.save_screenshot(path)  # has scrollbar
    driver.find_element_by_tag_name('body').screenshot(path)  # avoids scrollbar
    driver.set_window_size(original_size['width'], original_size['height'])
Run Code Online (Sandbox Code Playgroud)

如果使用早于3.6的Python,请从函数定义中删除类型注释.

  • 代码的最后一行(截取屏幕截图后)在循环工作时也很重要,因为如果错过该行,图像会变得越来越长。 (2认同)
  • 这很好用,但请注意,Selenium 的更高版本不推荐使用 find_element_by_tag_name() ,而是使用 find_element(by=By.TAG_NAME, value=tagname),这需要从 selenium.webdriver.common 导入 By (2认同)

小智 13

屏幕截图仅限于视口,但是您可以通过捕获body元素来解决此问题,因为Webdriver会捕获整个元素,即使它大于视口。这样可以省去滚动和拼接图像的麻烦,但是,页脚位置可能会出现问题(如下面的屏幕截图所示)。

在Windows 8和Mac High Sierra上使用Chrome驱动程序进行了测试。

from selenium import webdriver

url = 'https://stackoverflow.com/'
path = '/path/to/save/in/scrape.png'

driver = webdriver.Chrome()
driver.get(url)
el = driver.find_element_by_tag_name('body')
el.screenshot(path)
driver.quit()
Run Code Online (Sandbox Code Playgroud)

返回:(完整大小:https : //i.stack.imgur.com/ppDiI.png

SO_Scrape

  • 必须使用`headless`模式;请参阅:/sf/answers/4013723661/ (5认同)
  • 通过这种方法,我只能获得俯视图,其余的屏幕截图只是背景。 (5认同)
  • 不再有效;( (3认同)
  • 该主题的最佳答案,因为它基本上是硒的内置功能。无需过度设计解决方案。绝对疯子。 (2认同)
  • 这个答案对我不起作用,有时会获取唯一正在渲染的屏幕(可滚动)。这是一个更合适的答案:/sf/answers/3680104361/ (2认同)

Kla*_*nis 8

关键是要开启headless模式!无需拼接,无需加载页面两次。

完整的工作代码:

URL = 'http://www.w3schools.com/js/default.asp'

options = webdriver.ChromeOptions()
options.headless = True

driver = webdriver.Chrome(options=options)
driver.get(URL)

S = lambda X: driver.execute_script('return document.body.parentNode.scroll'+X)
driver.set_window_size(S('Width'),S('Height')) # May need manual adjustment
driver.find_element_by_tag_name('body').screenshot('web_screenshot.png')

driver.quit()
Run Code Online (Sandbox Code Playgroud)

这实际上与@Acumenus发布的代码相同,略有改进。

我的发现总结

无论如何,我决定发布此内容,因为我没有找到有关headless关闭模式(显示浏览器)时发生的情况的解释以进行屏幕截图。正如我测试的那样(使用 Chrome WebDriver),如果headless模式打开,屏幕截图会根据需要保存。但是,如果headless关闭该模式,保存的屏幕截图具有大致正确的宽度和高度,但结果因情况而异。通常,屏幕上可见的页面上部被保存,但图像的其余部分只是纯白色。还有一个案例是尝试使用上面的链接来保存这个 Stack Overflow 线程;甚至上半部分也没有保存,有趣的是现在是透明的,而其余部分仍然是白色的。我注意到的最后一个案例只有一次使用给定的W3Schools链接;那里没有白色部分,但页面的上部重复到最后,包括标题。

我希望这对许多由于某种原因没有得到预期结果的人有所帮助,因为我没有看到有人headless用这种简单的方法明确解释模式的要求。只有当我发现了解决这个问题我自己,我发现了一个帖子@ vc2279 提的是一具无头的浏览器窗口中可以设置为任意大小(这似乎是相反的情况也是如此)。虽然,我帖子中的解决方案改进了它不需要重复打开浏览器/驱动程序或重新加载页面。

进一步的建议

如果对于某些页面它不适合您,我建议time.sleep(seconds)在获取页面大小之前尝试添加。另一种情况是页面需要滚动到底部以加载更多内容,这可以通过scheight这篇文章中的方法解决:

scheight = .1
while scheight < 9.9:
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight/%s);" % scheight)
    scheight += .01
Run Code Online (Sandbox Code Playgroud)

另请注意,对于某些页面,内容可能不在任何顶级 HTML 标记中,例如<html><body>,例如,YouTube使用<ytd-app>标记。最后一点,我发现有一个页面“返回”了一个仍然带有水平滚动条的屏幕截图,窗口的大小需要手动调整,即图像宽度需要增加18个像素,如下所示:S('Width')+18


ihi*_*wer 7

知道@Moshisho的方法后.

我的完整独立工作脚本是...(在每个滚动和位置后添加睡眠0.2)

import sys
from selenium import webdriver
import util
import os
import time
from PIL import Image

def fullpage_screenshot(driver, file):

        print("Starting chrome full page screenshot workaround ...")

        total_width = driver.execute_script("return document.body.offsetWidth")
        total_height = driver.execute_script("return document.body.parentNode.scrollHeight")
        viewport_width = driver.execute_script("return document.body.clientWidth")
        viewport_height = driver.execute_script("return window.innerHeight")
        print("Total: ({0}, {1}), Viewport: ({2},{3})".format(total_width, total_height,viewport_width,viewport_height))
        rectangles = []

        i = 0
        while i < total_height:
            ii = 0
            top_height = i + viewport_height

            if top_height > total_height:
                top_height = total_height

            while ii < total_width:
                top_width = ii + viewport_width

                if top_width > total_width:
                    top_width = total_width

                print("Appending rectangle ({0},{1},{2},{3})".format(ii, i, top_width, top_height))
                rectangles.append((ii, i, top_width,top_height))

                ii = ii + viewport_width

            i = i + viewport_height

        stitched_image = Image.new('RGB', (total_width, total_height))
        previous = None
        part = 0

        for rectangle in rectangles:
            if not previous is None:
                driver.execute_script("window.scrollTo({0}, {1})".format(rectangle[0], rectangle[1]))
                time.sleep(0.2)
                driver.execute_script("document.getElementById('topnav').setAttribute('style', 'position: absolute; top: 0px;');")
                time.sleep(0.2)
                print("Scrolled To ({0},{1})".format(rectangle[0], rectangle[1]))
                time.sleep(0.2)

            file_name = "part_{0}.png".format(part)
            print("Capturing {0} ...".format(file_name))

            driver.get_screenshot_as_file(file_name)
            screenshot = Image.open(file_name)

            if rectangle[1] + viewport_height > total_height:
                offset = (rectangle[0], total_height - viewport_height)
            else:
                offset = (rectangle[0], rectangle[1])

            print("Adding to stitched image with offset ({0}, {1})".format(offset[0],offset[1]))
            stitched_image.paste(screenshot, offset)

            del screenshot
            os.remove(file_name)
            part = part + 1
            previous = rectangle

        stitched_image.save(file)
        print("Finishing chrome full page screenshot workaround...")
        return True


driver = webdriver.Chrome()

''' Generate document-height screenshot '''
url = "http://effbot.org/imagingbook/introduction.htm"
url = "http://www.w3schools.com/js/default.asp"
driver.get(url)
fullpage_screenshot(driver, "test1236.png")
Run Code Online (Sandbox Code Playgroud)


jer*_*mie 7

不确定人们是否还有这个问题.我做了一个非常好的小黑客,它与动态区域很好地配合.希望能帮助到你

# 1. get dimensions
browser = webdriver.Chrome(chrome_options=options)
browser.set_window_size(default_width, default_height)
browser.get(url)
time.sleep(sometime)
total_height = browser.execute_script("return document.body.parentNode.scrollHeight")
browser.quit()

# 2. get screenshot
browser = webdriver.Chrome(chrome_options=options)
browser.set_window_size(default_width, total_height)
browser.get(url)  
browser.save_screenshot(screenshot_path)
Run Code Online (Sandbox Code Playgroud)

  • 这不必要地将页面加载两次,并且根本无法定义宽度。我现在有一个[**较新的答案**](/sf/answers/3680104361/),可以纠正这些问题。 (2认同)

Mos*_*sho 6

您可以通过在屏幕截图之前更改标题的CSS来实现此目的:

topnav = driver.find_element_by_id("topnav")
driver.execute_script("arguments[0].setAttribute('style', 'position: absolute; top: 0px;')", topnav) 
Run Code Online (Sandbox Code Playgroud)

编辑:在窗口滚动后放置此行:

driver.execute_script("document.getElementById('topnav').setAttribute('style', 'position: absolute; top: 0px;');")
Run Code Online (Sandbox Code Playgroud)

所以在你的util.py中它将是:

driver.execute_script("window.scrollTo({0}, {1})".format(rectangle[0], rectangle[1]))
driver.execute_script("document.getElementById('topnav').setAttribute('style', 'position: absolute; top: 0px;');")
Run Code Online (Sandbox Code Playgroud)

如果网站正在使用该header标记,您可以使用该标记find_element_by_tag_name("header")


小智 6

我更改了Python 3.6的代码,也许对某人有用:

from selenium import webdriver
from sys import stdout
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
import unittest
#from Login_Page import Login_Page
from selenium.webdriver.firefox.firefox_binary import FirefoxBinary
from io import BytesIO
from PIL import Image

def testdenovoUIavailable(self):
        binary = FirefoxBinary("C:\\Mozilla Firefox\\firefox.exe") 
        self.driver  = webdriver.Firefox(firefox_binary=binary)
        verbose = 0

        #open page
        self.driver.get("http://yandex.ru")

        #hide fixed header        
        #js_hide_header=' var x = document.getElementsByClassName("topnavbar-wrapper ng-scope")[0];x[\'style\'] = \'display:none\';'
        #self.driver.execute_script(js_hide_header)

        #get total height of page
        js = 'return Math.max( document.body.scrollHeight, document.body.offsetHeight,  document.documentElement.clientHeight,  document.documentElement.scrollHeight,  document.documentElement.offsetHeight);'

        scrollheight = self.driver.execute_script(js)
        if verbose > 0:
            print(scrollheight)

        slices = []
        offset = 0
        offset_arr=[]

        #separate full screen in parts and make printscreens
        while offset < scrollheight:
            if verbose > 0: 
                print(offset)

            #scroll to size of page 
            if (scrollheight-offset)<offset:
                #if part of screen is the last one, we need to scroll just on rest of page
                self.driver.execute_script("window.scrollTo(0, %s);" % (scrollheight-offset))
                offset_arr.append(scrollheight-offset)
            else:
                self.driver.execute_script("window.scrollTo(0, %s);" % offset)
                offset_arr.append(offset)

            #create image (in Python 3.6 use BytesIO)
            img = Image.open(BytesIO(self.driver.get_screenshot_as_png()))


            offset += img.size[1]
            #append new printscreen to array
            slices.append(img)


            if verbose > 0:
                self.driver.get_screenshot_as_file('screen_%s.jpg' % (offset))
                print(scrollheight)

        #create image with 
        screenshot = Image.new('RGB', (slices[0].size[0], scrollheight))
        offset = 0
        offset2= 0
        #now glue all images together
        for img in slices:
            screenshot.paste(img, (0, offset_arr[offset2])) 
            offset += img.size[1]
            offset2+= 1      

        screenshot.save('test.png')
Run Code Online (Sandbox Code Playgroud)


Val*_*ali 5

为什么不只是获取页面的宽度和高度,然后调整驱动程序的大小?所以会是这样的

total_width = driver.execute_script("return document.body.offsetWidth")
total_height = driver.execute_script("return document.body.scrollHeight")
driver.set_window_size(total_width, total_height)
driver.save_screenshot("SomeName.png")
Run Code Online (Sandbox Code Playgroud)

这样就可以制作整个页面的屏幕截图,而无需合并不同的部分。


小智 5

来源:https : //pypi.org/project/Selenium-Screenshot/

from Screenshot import Screenshot_Clipping
from selenium import webdriver
import time
ob = Screenshot_Clipping.Screenshot()
driver = webdriver.Chrome()
url = "https://www.bbc.com/news/world-asia-china-51108726"
driver.get(url)
time.sleep(1)
img_url = ob.full_Screenshot(driver, save_path=r'.', image_name='Myimage.png')
driver.close()

driver.quit()
Run Code Online (Sandbox Code Playgroud)

  • 为了使这个答案对这个问题的读者更有用,请考虑添加一些散文来解释您正在做的事情。 (4认同)

Cyr*_*rus 5

全页屏幕截图不属于W3C 规范的一部分。然而,许多网络驱动程序实现自己的端点来获取真正的全页屏幕截图。我发现使用 geckodriver 的这种方法优于注入的“屏幕截图、滚动、缝合”方法,并且比在无头模式下调整窗口大小要好得多。

例子:

from selenium import webdriver
from selenium.webdriver.firefox.service import Service
from selenium.webdriver.firefox.options import Options

options = Options()
options.headless = True
service = Service('/your/path/to/geckodriver')
driver = webdriver.Firefox(options=options, service=service)

driver.get('https://www.nytimes.com/')
driver.get_full_page_screenshot_as_file('example.png')

driver.close()
Run Code Online (Sandbox Code Playgroud)

壁虎驱动程序 (火狐)

如果您使用 geckodriver,您可以使用以下方法:

driver.get_full_page_screenshot_as_file
driver.save_full_page_screenshot
driver.get_full_page_screenshot_as_png
driver.get_full_page_screenshot_as_base64 
Run Code Online (Sandbox Code Playgroud)

我已经测试并确认这些可以在Selenium 4.07上运行。我不相信 Selenium 3 中包含这些功能。

我能找到的关于这些的最好的文档是在这个合并中

chromedriver(铬)

看来 chromedriver 已经实现了自己的全页截图功能:

https://chromium-review.googlesource.com/c/chromium/src/+/2300980

Selenium 团队的目标似乎是在 Selenium 4 中提供支持:

https://github.com/SeleniumHQ/selenium/issues/8168


tsa*_*ein 5

对于 Chrome,还可以使用Chrome DevTools Protocol

import base64
...
        page_rect = browser.driver.execute_cdp_cmd("Page.getLayoutMetrics", {})
        screenshot = browser.driver.execute_cdp_cmd(
            "Page.captureScreenshot",
            {
                "format": "png",
                "captureBeyondViewport": True,
                "clip": {
                    "width": page_rect["contentSize"]["width"],
                    "height": page_rect["contentSize"]["height"],
                    "x": 0,
                    "y": 0,
                    "scale": 1
                }
            })

        with open(path, "wb") as file:
            file.write(base64.urlsafe_b64decode(screenshot["data"]))

Run Code Online (Sandbox Code Playgroud)

制作人员

这在无头和非无头模式下都有效。