Est*_*man 53 python webpage screenshot backend
我想要实现的是从python中的任何网站获取网站截图.
环境:Linux
hoj*_*oju 45
这是一个使用webkit的简单解决方案:http: //webscraping.com/blog/Webpage-screenshots-with-webkit/
import sys
import time
from PyQt4.QtCore import *
from PyQt4.QtGui import *
from PyQt4.QtWebKit import *
class Screenshot(QWebView):
def __init__(self):
self.app = QApplication(sys.argv)
QWebView.__init__(self)
self._loaded = False
self.loadFinished.connect(self._loadFinished)
def capture(self, url, output_file):
self.load(QUrl(url))
self.wait_load()
# set to webpage size
frame = self.page().mainFrame()
self.page().setViewportSize(frame.contentsSize())
# render image
image = QImage(self.page().viewportSize(), QImage.Format_ARGB32)
painter = QPainter(image)
frame.render(painter)
painter.end()
print 'saving', output_file
image.save(output_file)
def wait_load(self, delay=0):
# process app events until page loaded
while not self._loaded:
self.app.processEvents()
time.sleep(delay)
self._loaded = False
def _loadFinished(self, result):
self._loaded = True
s = Screenshot()
s.capture('http://webscraping.com', 'website.png')
s.capture('http://webscraping.com/blog', 'blog.png')
Run Code Online (Sandbox Code Playgroud)
Aam*_*nan 35
这是我的解决方案,从各种来源获取帮助.它需要完整的网页屏幕捕获并裁剪它(可选)并从裁剪后的图像中生成缩略图.以下是要求:
要求:
npm -g install phantomjs
import os
from subprocess import Popen, PIPE
from selenium import webdriver
abspath = lambda *p: os.path.abspath(os.path.join(*p))
ROOT = abspath(os.path.dirname(__file__))
def execute_command(command):
result = Popen(command, shell=True, stdout=PIPE).stdout.read()
if len(result) > 0 and not result.isspace():
raise Exception(result)
def do_screen_capturing(url, screen_path, width, height):
print "Capturing screen.."
driver = webdriver.PhantomJS()
# it save service log file in same directory
# if you want to have log file stored else where
# initialize the webdriver.PhantomJS() as
# driver = webdriver.PhantomJS(service_log_path='/var/log/phantomjs/ghostdriver.log')
driver.set_script_timeout(30)
if width and height:
driver.set_window_size(width, height)
driver.get(url)
driver.save_screenshot(screen_path)
def do_crop(params):
print "Croping captured image.."
command = [
'convert',
params['screen_path'],
'-crop', '%sx%s+0+0' % (params['width'], params['height']),
params['crop_path']
]
execute_command(' '.join(command))
def do_thumbnail(params):
print "Generating thumbnail from croped captured image.."
command = [
'convert',
params['crop_path'],
'-filter', 'Lanczos',
'-thumbnail', '%sx%s' % (params['width'], params['height']),
params['thumbnail_path']
]
execute_command(' '.join(command))
def get_screen_shot(**kwargs):
url = kwargs['url']
width = int(kwargs.get('width', 1024)) # screen width to capture
height = int(kwargs.get('height', 768)) # screen height to capture
filename = kwargs.get('filename', 'screen.png') # file name e.g. screen.png
path = kwargs.get('path', ROOT) # directory path to store screen
crop = kwargs.get('crop', False) # crop the captured screen
crop_width = int(kwargs.get('crop_width', width)) # the width of crop screen
crop_height = int(kwargs.get('crop_height', height)) # the height of crop screen
crop_replace = kwargs.get('crop_replace', False) # does crop image replace original screen capture?
thumbnail = kwargs.get('thumbnail', False) # generate thumbnail from screen, requires crop=True
thumbnail_width = int(kwargs.get('thumbnail_width', width)) # the width of thumbnail
thumbnail_height = int(kwargs.get('thumbnail_height', height)) # the height of thumbnail
thumbnail_replace = kwargs.get('thumbnail_replace', False) # does thumbnail image replace crop image?
screen_path = abspath(path, filename)
crop_path = thumbnail_path = screen_path
if thumbnail and not crop:
raise Exception, 'Thumnail generation requires crop image, set crop=True'
do_screen_capturing(url, screen_path, width, height)
if crop:
if not crop_replace:
crop_path = abspath(path, 'crop_'+filename)
params = {
'width': crop_width, 'height': crop_height,
'crop_path': crop_path, 'screen_path': screen_path}
do_crop(params)
if thumbnail:
if not thumbnail_replace:
thumbnail_path = abspath(path, 'thumbnail_'+filename)
params = {
'width': thumbnail_width, 'height': thumbnail_height,
'thumbnail_path': thumbnail_path, 'crop_path': crop_path}
do_thumbnail(params)
return screen_path, crop_path, thumbnail_path
if __name__ == '__main__':
'''
Requirements:
Install NodeJS
Using Node's package manager install phantomjs: npm -g install phantomjs
install selenium (in your virtualenv, if you are using that)
install imageMagick
add phantomjs to system path (on windows)
'''
url = 'http://stackoverflow.com/questions/1197172/how-can-i-take-a-screenshot-image-of-a-website-using-python'
screen_path, crop_path, thumbnail_path = get_screen_shot(
url=url, filename='sof.png',
crop=True, crop_replace=False,
thumbnail=True, thumbnail_replace=False,
thumbnail_width=200, thumbnail_height=150,
)
Run Code Online (Sandbox Code Playgroud)
这些是生成的图像:
Ped*_*ito 13
11年后...
Python3.6
使用和截取网站屏幕截图Google PageSpeedApi Insights v5
:
import base64
import requests
import traceback
import urllib.parse as ul
# It's possible to make requests without the api key, but the number of requests is very limited
url = "https://duckgo.com"
urle = ul.quote_plus(url)
image_path = "duckgo.jpg"
key = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
strategy = "desktop" # "mobile"
u = f"https://www.googleapis.com/pagespeedonline/v5/runPagespeed?key={key}&strategy={strategy}&url={urle}"
try:
j = requests.get(u).json()
ss_encoded = j['lighthouseResult']['audits']['final-screenshot']['details']['data'].replace("data:image/jpeg;base64,", "")
ss_decoded = base64.b64decode(ss_encoded)
with open(image_path, 'wb+') as f:
f.write(ss_decoded)
except:
print(traceback.format_exc())
exit(1)
Run Code Online (Sandbox Code Playgroud)
笔记:
ars*_*ars 12
在Mac上,有webkit2png,在Linux + KDE上,你可以使用khtml2png.我尝试了前者并且效果很好,并且听说后者正在使用.
我最近遇到了QtWebKit,它声称是跨平台的(Qt将WebKit推入他们的库中,我猜).但我从未尝试过,所以我不能告诉你更多.
QtWebKit链接显示了如何从Python访问.您应该至少可以使用子进程对其他进程执行相同的操作.
可以使用硒
from selenium import webdriver
DRIVER = 'chromedriver'
driver = webdriver.Chrome(DRIVER)
driver.get('https://www.spotify.com')
screenshot = driver.save_screenshot('my_screenshot.png')
driver.quit()
Run Code Online (Sandbox Code Playgroud)
https://sites.google.com/a/chromium.org/chromedriver/getting-started
使用Rendertron是一种选择。在幕后,这是一个无头的 Chrome,暴露了以下端点:
/render/:url
:访问此路线,例如,requests.get
如果您对 DOM 感兴趣。/screenshot/:url
:如果您对屏幕截图感兴趣,请访问此路线。您可以使用 npm 安装 rendertron,rendertron
在一个终端中运行,访问http://localhost:3000/screenshot/:url
并保存文件,但是render-tron.appspot.com上提供了一个演示,可以在不安装 npm 包的情况下在本地运行此 Python3 代码段:
import requests
BASE = 'https://render-tron.appspot.com/screenshot/'
url = 'https://google.com'
path = 'target.jpg'
response = requests.get(BASE + url, stream=True)
# save file, see /sf/answers/919651141/
if response.status_code == 200:
with open(path, 'wb') as file:
for chunk in response:
file.write(chunk)
Run Code Online (Sandbox Code Playgroud)
我无法对ars的回答发表评论,但实际上我使用QtWebkit运行了Roland Tapken的代码并且运行良好.
只是想确认Roland在他的博客上发布的内容在Ubuntu上运行得很好.我们的生产版本最终没有使用他写的任何内容,但我们使用PyQt/QtWebKit绑定取得了很大的成功.
这是一个老问题,大多数答案都有点过时了。目前,我会做两件事中的一件。
1. 创建一个截屏程序
我会使用Pyppeteer来截取网站的屏幕截图。它在Puppeteer包上运行。Puppeteer 会启动无头 Chrome 浏览器,因此屏幕截图看起来就像在普通浏览器中一样。
这取自 pyppeteer 文档:
import asyncio
from pyppeteer import launch
async def main():
browser = await launch()
page = await browser.newPage()
await page.goto('https://example.com')
await page.screenshot({'path': 'example.png'})
await browser.close()
asyncio.get_event_loop().run_until_complete(main())
Run Code Online (Sandbox Code Playgroud)
2.使用截图API
您还可以使用截图 API,例如此API 。好处是您不必自己设置所有内容,而只需调用 API 端点即可。
这取自截图 API 的文档:
import urllib.parse
import urllib.request
import ssl
ssl._create_default_https_context = ssl._create_unverified_context
# The parameters.
token = "YOUR_API_TOKEN"
url = urllib.parse.quote_plus("https://example.com")
width = 1920
height = 1080
output = "image"
# Create the query URL.
query = "https://screenshotapi.net/api/v1/screenshot"
query += "?token=%s&url=%s&width=%d&height=%d&output=%s" % (token, url, width, height, output)
# Call the API.
urllib.request.urlretrieve(query, "./example.png")
Run Code Online (Sandbox Code Playgroud)