如何使用cgi python脚本在浏览器中显示pdf文件内容及其全名？

Question

如何使用cgi python脚本在浏览器中显示pdf文件内容及其全名？

use*_*424 5 html python cgi content-type

我希望显示pdf文件的完整路径及其在浏览器上显示的内容.我的脚本有一个输入html,用户将输入文件名并提交表单.该脚本将搜索该文件,如果在子目录中找到该文件,则将文件内容输出到浏览器中并显示其名称.我能够显示内容,但也无法同时显示完整的名字.如果我显示文件名,我会看到内容的垃圾字符显示.请指导.

在此输入链接描述

脚本a.py:

import os
import cgi
import cgitb 
cgitb.enable()
import sys
import webbrowser

def check_file_extension(display_file):
    input_file = display_file
    nm,file_extension = os.path.splitext(display_file)
    return file_extension

form = cgi.FieldStorage()

type_of_file =''
file_nm = ''
nm =''
not_found = 3

if form.has_key("file1"):
    file_nm = form["file1"].value

type_of_file = check_file_extension(file_nm)

pdf_paths = [ '/home/nancy/Documents/',]

# Change the path while executing on the server , else it will throw error 500
image_paths = [ '/home/nancy/Documents/']


if type_of_file == '.pdf':
    search_paths = pdf_paths
else:
    # .jpg
    search_paths = image_paths
for path in search_paths:
    for root, dirnames, filenames in os.walk(path):
        for f in filenames:
            if f == str(file_nm).strip():
                absolute_path_of_file = os.path.join(root,f)
                # print 'Content-type: text/html\n\n'
                # print '<html><head></head><body>'
                # print absolute_path_of_file
                # print '</body></html>'
#                 print """Content-type: text/html\n\n
# <html><head>absolute_path_of_file</head><body>
# <img src=file_display.py />
# </body></html>"""
                not_found = 2
                if  search_paths == pdf_paths:
                    print 'Content-type: application/pdf\n'
                else:
                    print 'Content-type: image/jpg\n'
                file_read = file(absolute_path_of_file,'rb').read()
                print file_read
                print 'Content-type: text/html\n\n'
                print absolute_path_of_file
                break
        break
    break

if not_found == 3:
    print  'Content-type: text/html\n'
    print '%s not found' % absolute_path_of_file

Run Code Online (Sandbox Code Playgroud)

html是一个常规的html,文件名只有1个输入字段.

Answer 1

Bla*_*ack 4

这不可能。至少没那么简单。有些网络浏览器不显示 PDF，但要求用户下载文件，有些浏览器自己显示它们，有些嵌入外部 PDF 查看器组件，有些启动外部 PDF 查看器。没有标准的跨浏览器方式将 PDF 嵌入到 HTML 中，如果您想显示任意文本和PDF 内容，则需要这种方式。

一个适用于所有浏览器的后备解决方案是将服务器上的 PDF 页面呈现为图像并将其提供给客户端。这给服务器带来了一些压力（处理器、用于缓存的内存/磁盘、带宽）。

一些支持 HTML5 的现代浏览器可以使用Mozilla 的 pdf.js在画布元素上呈现 PDF。

对于其他的，您可以尝试使用Adobe 的插件，如<embed>Adob e 的 PDF Developer Junkie Blog中所述。<object>

在服务器上渲染页面

将 PDF 页面呈现为图像并提供服务需要服务器上的某些软件来查询页面数以及提取给定页面并将其呈现为图像。

页数可以使用Xpdf或libpoppler命令行实用pdfinfo程序中的程序确定。可以使用ImageMagick 工具将页面从 PDF 文件转换为 JPG 图像。使用这些程序的一个非常简单的 CGI 程序：convert

#!/usr/bin/env python import cgi import cgitb; cgitb.enable() import os from itertools import imap from subprocess import check_output PDFINFO = '/usr/bin/pdfinfo' CONVERT = '/usr/bin/convert' DOC_ROOT = '/home/bj/Documents' BASE_TEMPLATE = ( 'Content-type: text/html\n\n' '<html><head><title>{title}</title></head><body>{body}</body></html>' ) PDF_PAGE_TEMPLATE = ( '<h1>{filename}</h1>' '<p>{prev_link} {page}/{page_count} {next_link}</p>' '<p><img src="{image_url}" style="border: solid thin gray;"></p>' ) SCRIPT_NAME = os.environ['SCRIPT_NAME'] def create_page_url(filename, page_number, type_): return '{0}?file={1}&page={2}&type={3}'.format( cgi.escape(SCRIPT_NAME, True), cgi.escape(filename, True), page_number, type_ ) def create_page_link(text, filename, page_number): text = cgi.escape(text) if page_number is None: return '<span style="color: gray;">{0}</span>'.format(text) else: return '<a href="{0}">{1}</a>'.format( create_page_url(filename, page_number, 'html'), text ) def get_page_count(filename): def parse_line(line): key, _, value = line.partition(':') return key, value.strip() info = dict( imap(parse_line, check_output([PDFINFO, filename]).splitlines()) ) return int(info['Pages']) def get_page(filename, page_index): return check_output( [ CONVERT, '-density', '96', '{0}[{1}]'.format(filename, page_index), 'jpg:-' ] ) def send_error(message): print BASE_TEMPLATE.format( title='Error', body='<h1>Error</h1>{0}'.format(message) ) def send_page_html(_pdf_path, filename, page_number, page_count): body = PDF_PAGE_TEMPLATE.format( filename=cgi.escape(filename), page=page_number, page_count=page_count, image_url=create_page_url(filename, page_number, 'jpg'), prev_link=create_page_link( '<<', filename, page_number - 1 if page_number > 1 else None ), next_link=create_page_link( '>>', filename, page_number + 1 if page_number < page_count else None ) ) print BASE_TEMPLATE.format(title='PDF', body=body) def send_page_image(pdf_path, _filename, page_number, _page_count): image_data = get_page(pdf_path, page_number - 1) print 'Content-type: image/jpg' print 'Content-Length:', len(image_data) print print image_data TYPE2SEND_FUNCTION = { 'html': send_page_html, 'jpg': send_page_image, } def main(): form = cgi.FieldStorage() filename = form.getfirst('file') page_number = int(form.getfirst('page', 1)) type_ = form.getfirst('type', 'html') pdf_path = os.path.abspath(os.path.join(DOC_ROOT, filename)) if os.path.exists(pdf_path) and pdf_path.startswith(DOC_ROOT): page_count = get_page_count(pdf_path) page_number = min(max(1, page_number), page_count) TYPE2SEND_FUNCTION[type_](pdf_path, filename, page_number, page_count) else: send_error( '<p>PDF file <em>{0!r}</em> not found.</p>'.format( cgi.escape(filename) ) ) main()
Run Code Online (Sandbox Code Playgroud)
libpoppler有 Python 绑定，因此对外部程序的调用pdfinfo可以很容易地用该模块替换。它还可用于提取页面的更多信息，例如 PDF 页面上的链接，以便为它们创建 HTML 图像映射。安装了libcairo Python 绑定后，甚至可以在没有外部进程的情况下渲染页面。

考虑到其他限制，恕我直言，这是不可能的。您必须“允许”JavaScript 和/或第三方浏览器插件，并且必须接受并非每个浏览器都能够按照您想要的方式显示它的事实。您还可以使用在服务器上渲染页面的解决方案在服务器上预渲染和/或缓存页面图像，以降低 CPU 负载。 (2认同)

归档时间：	11 年，10 月前
查看次数：	1335 次
最近记录：	11 年，9 月前