如何在Python中将多页PDF转换为图像对象列表？

Question

如何在Python中将多页PDF转换为图像对象列表？

Hen*_*rik 5 python image image-processing wand

我想将多页PDF文档转换为列表结构中的一系列图像对象，而无需在Python中将图像保存在磁盘中（我想使用PIL Image处理它们）。到目前为止，我只能这样做，首先将图像写入文件：

from wand.image import Image

with Image(filename='source.pdf') as img:

    with img.convert('png') as converted:
        converted.save(filename='pyout/page.png')

Run Code Online (Sandbox Code Playgroud)

但是如何将上面的img对象直接转换为PIL.Image对象列表？

Answer 1

Bry*_*Kou 8

新答案：

pip安装pdf2image

from pdf2image import convert_from_path, convert_from_bytes
images = convert_from_path('/path/to/my.pdf')

Run Code Online (Sandbox Code Playgroud)

您可能还需要安装枕头。这可能只适用于 Linux。

https://github.com/Belval/pdf2image

两种方法的结果可能不同。

旧答案：

Python 3.4：

from PIL import Image
from wand.image import Image as wimage
import os
import io

if __name__ == "__main__":
    filepath = "fill this in"
    assert os.path.exists(filepath)
    page_images = []
    with wimage(filename=filepath, resolution=200) as img:
        for page_wand_image_seq in img.sequence:
            page_wand_image = wimage(page_wand_image_seq)
            page_jpeg_bytes = page_wand_image.make_blob(format="jpeg")
            page_jpeg_data = io.BytesIO(page_jpeg_bytes)
            page_image = Image.open(page_jpeg_data)
            page_images.append(page_image)

Run Code Online (Sandbox Code Playgroud)

最后，您可以对 mogrify 进行系统调用，但这可能会更复杂，因为您需要管理临时文件。

Answer 2

yea*_*ark 5

简单的方法是保存图像文件并使用PIL读取后删除它们。

我建议使用 pdf2image 包。在使用 pdf2image 包之前，您可能需要通过 anaconda 安装 poppler 包

conda install -c conda-forge poppler

Run Code Online (Sandbox Code Playgroud)

如果您遇到困难，请在安装之前更新 conda ：

conda update conda
conda update anaconda

Run Code Online (Sandbox Code Playgroud)

安装 poppler 后，通过 pip 安装 pdf2image ：

pip install pdf2image

Run Code Online (Sandbox Code Playgroud)

然后运行这段代码：

from pdf2image import convert_from_path
dpi = 500 # dots per inch
pdf_file = 'work.pdf'
pages = convert_from_path(pdf_file ,dpi )
for i in range(len(pages)):
   page = pages[i]
   page.save('output_{}.jpg'.format(i), 'JPEG')

Run Code Online (Sandbox Code Playgroud)

之后，请使用 PIL 读取它们并删除它们。

归档时间：	8 年，7 月前
查看次数：	8226 次
最近记录：	7 年，3 月前