将 PDF 转换/写入 RAM 作为类似文件的对象，以便进一步使用它

Question

将 PDF 转换/写入 RAM 作为类似文件的对象，以便进一步使用它

我的脚本生成 PDF ( PyPDF2.pdf.PdfFileWriter object) 并将其存储在变量中。我需要在脚本中进一步处理它file-like object。但现在我必须先将其写入硬盘。然后我必须将其作为文件打开才能使用它。

为了防止这种不必要的写入/读取操作，我找到了许多解决方案 -StringIO等等BytesIO。但我找不到什么可以帮助我解决我的情况。

据我了解 - 我需要“转换”（或写入RAM）PyPDF2.pdf.PdfFileWriter object才能file-like object直接使用它。

或者还有另一种方法完全适合我的情况？

更新 - 这是代码示例

from pdfrw import PdfReader, PdfWriter, PageMerge
from PyPDF2 import PdfFileReader, PdfFileWriter


red_file = PdfFileReader(open("file_name.pdf", 'rb'))

large_pages_indexes = [1, 7, 9]

large = PdfFileWriter()
for i in large_pages_indexes:
    p = red_file.getPage(i)
    large.addPage(p)

# here final data have to be written (I would like to avoid that)
with open("virtual_file.pdf", 'wb') as tmp:
  large.write(tmp)

# here I need to read exported "virtual_file.pdf" (I would like to avoid that too)
with open("virtual_file.pdf", 'rb') as tmp:
  pdf = PdfReader(tmp) # here I'm starting to work with this file using another module "pdfrw"
  print(pdf)

Run Code Online (Sandbox Code Playgroud)

Answer 1

J_H*_*J_H 7

为了避免磁盘 I/O 缓慢，您似乎需要更换

with open("virtual_file.pdf", 'wb') as tmp:
  large.write(tmp)

with open("virtual_file.pdf", 'rb') as tmp:
  pdf = PdfReader(tmp)

Run Code Online (Sandbox Code Playgroud)

和

buf = io.BytesIO()
large.write(buf)
buf.seek(0)
pdf = PdfReader(buf)

Run Code Online (Sandbox Code Playgroud)

另外，buf.getvalue()也可供您使用。

归档时间：	6 年，4 月前
查看次数：	3125 次
最近记录：	2 年，1 月前