Mak*_*nts 12 python pdf-generation reportlab pypdf
当我从任何源PDF中打印PDF时,文件大小会下降并删除表单中显示的文本框.简而言之,它会使文件变平.这是我想要实现的行为.
下面的代码使用另一个PDF作为源(我想要展平的那个)来创建PDF,它也会写入文本框形式.
我可以在没有文本框的情况下获得PDF,将其展平吗?就像Adobe在PDF上打印PDF一样.
我的其他代码看起来像这样减去一些东西:
import os
import StringIO
from pyPdf import PdfFileWriter, PdfFileReader
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import letter
directory = os.path.join(os.getcwd(), "source") # dir we are interested in
fif = [f for f in os.listdir(directory) if f[-3:] == 'pdf'] # get the PDFs
for i in fif:
packet = StringIO.StringIO()
can = canvas.Canvas(packet, pagesize=letter)
can.rotate(-90)
can.save()
packet.seek(0)
new_pdf = PdfFileReader(packet)
fname = os.path.join('source', i)
existing_pdf = PdfFileReader(file(fname, "rb"))
output = PdfFileWriter()
nump = existing_pdf.getNumPages()
page = existing_pdf.getPage(0)
for l in range(nump):
output.addPage(existing_pdf.getPage(l))
page.mergePage(new_pdf.getPage(0))
outputStream = file("out-"+i, "wb")
output.write(outputStream)
outputStream.close()
print fName + " written as", i
Run Code Online (Sandbox Code Playgroud)
总结: 我有一个pdf,我添加了一个文本框,覆盖信息并添加新信息,然后我从该pdf打印pdf.文本框不再可编辑或可移动.我想自动执行该过程,但我尝试的所有内容仍允许该文本框可编辑.
nak*_*nis 11
如果安装OS包是一个选项,那么您可以使用pdftk它的python包装器,pypdftk如下所示:
import pypdftk
pypdftk.fill_form('filled.pdf', out_file='flattened.pdf', flatten=True)
Run Code Online (Sandbox Code Playgroud)
你还需要安装pdftk包,在Ubuntu上可以这样做:
sudo apt-get install pdftk
Run Code Online (Sandbox Code Playgroud)
该pypdftk库可以从PyPI下载:
pip install pypdftk
Run Code Online (Sandbox Code Playgroud)
根据 Adobe Docs,您可以将可编辑表单字段的位位置更改为 1 以使字段只读,请参阅文档管理 - PDF 第 1 部分 - 12.7.2 交互式表单词典和此处了解详细信息。
我提供了一个替代解决方案,但使用Django.
用于PyPDF2填充字段,然后循环注释以更改位位置。这是一个适用于第一页的示例:
from io import BytesIO
from PyPDF2 import PdfFileReader, PdfFileWriter
from PyPDF2.generic import BooleanObject, NameObject, NumberObject
# open the pdf
input_stream = open("YourPDF.pdf", "rb")
reader = PdfFileReader(input_stream, strict=False)
if "/AcroForm" in reader.trailer["/Root"]:
reader.trailer["/Root"]["/AcroForm"].update(
{NameObject("/NeedAppearances"): BooleanObject(True)}
)
writer = PdfFileWriter()
writer.set_need_appearances_writer()
if "/AcroForm" in writer._root_object:
# Acro form is form field, set needs appearances to fix printing issues
writer._root_object["/AcroForm"].update(
{NameObject("/NeedAppearances"): BooleanObject(True)}
)
data_dict = dict() # this is a dict of your form values
writer.addPage(reader.getPage(0))
page = writer.getPage(0)
# update form fields
writer.updatePageFormFieldValues(page, data_dict)
for j in range(0, len(page["/Annots"])):
writer_annot = page["/Annots"][j].getObject()
for field in data_dict:
if writer_annot.get("/T") == field:
# make ReadOnly:
writer_annot.update({NameObject("/Ff"): NumberObject(1)})
# output_stream is your flattened PDF
output_stream = BytesIO()
writer.write(output_stream)
input_stream.close()
Run Code Online (Sandbox Code Playgroud)
更新
正如@MartinThoma 在评论中指出的那样,PyPDF2它已经结束并且不再被维护(他是维护者)。一切又回到了pypdf包裹。不过,还好,随着 的更新pypdf,我所做的只是交换软件包,代码的工作原理是一样的......我没想到会这样!
pypdf与最初编写此代码时相比,我已经稍微更新了代码,但这里是使用更新而不是更新的版本PyPDF2:
from io import BytesIO
import pypdf
from pypdf.generic import NameObject, NumberObject, BooleanObject, IndirectObject
def fill_with_pypdf(file, data):
"""
Used to fill PDF with PyPDF.
To fill, PDF form must have field name values that match the dictionary keys
:param file: The PDF being written to
:param data: The data dictionary being written to the PDF Fields
:return:
"""
with open(file, "rb") as input_stream:
# you don't actually need to wrap the BinaryIO in BytesIO but pycharm complained
pdf_reader = pypdf.PdfReader(BytesIO(input_stream.read()), strict=False)
data = {f"{{{{ {k} }}}}": v for k, v in data.items()}
print(data)
if "/AcroForm" in pdf_reader.trailer["/Root"]:
print('here')
pdf_reader.trailer["/Root"]["/AcroForm"].update(
{NameObject("/NeedAppearances"): BooleanObject(True)})
writer = pypdf.PdfWriter()
# alter NeedAppearances
try:
catalog = writer._root_object
# get the AcroForm tree and add "/NeedAppearances attribute
if "/AcroForm" not in catalog:
writer._root_object.update({
NameObject("/AcroForm"): IndirectObject(len(writer._objects), 0, writer)})
need_appearances = NameObject("/NeedAppearances")
writer._root_object["/AcroForm"][need_appearances] = BooleanObject(True)
except Exception as e:
print('set_need_appearances_writer() catch : ', repr(e))
if "/AcroForm" in writer._root_object:
# Acro form is form field, set needs appearances to fix printing issues
writer._root_object["/AcroForm"].update(
{NameObject("/NeedAppearances"): BooleanObject(True)})
# loop over all pages
for page_num in range(len(pdf_reader.pages)):
writer.add_page(pdf_reader.pages[page_num])
page = writer.pages[page_num]
# loop over annotations, but ensure they are there first...
if page.get('/Annots'):
# update field values
writer.update_page_form_field_values(page, data)
for j in range(0, len(page['/Annots'])):
writer_annot = page['/Annots'][j].get_object()
# flatten all the fields by setting bit position to 1
# use loop below if only specific fields need to be flattened.
writer_annot.update({
NameObject("/Ff"): NumberObject(1) # changing bit position to 1 flattens field
})
output_stream = BytesIO()
writer.write(output_stream)
print('done')
return output_stream.getvalue()
Run Code Online (Sandbox Code Playgroud)