小编Ste*_*eve的帖子

检测 .pdf 或图像中的框并将其裁剪为单个图像

我有一个包含笔迹的多页 .pdf（扫描图像），我想裁剪并存储为新的单独图像。例如，在下面的视觉效果中，我想将 2 个框内的笔迹提取为单独的图像。如何使用 python 为大型多页 .pdf 自动执行此操作？

我尝试使用该PyPDF2包根据 (x,y) 坐标裁剪其中一个手写框，但是这种方法对我不起作用，因为手写框的边界/坐标对于 pdf 中的每个页面并不总是相同. 我相信检测框将是自动裁剪的更好方法。不确定它是否有用，但下面是我用于 (x,y) 坐标方法的代码：

from PyPDF2 import PdfFileReader, PdfFileWriter

reader = PdfFileReader('data/samples.pdf', 'r')

# getting the first page
page = reader.getPage(0) 

writer = PdfFileWriter()

# Loop through all pages in pdf object to crop based on (x,y) coordinates
for i in range(reader.getNumPages()):
    page = reader.getPage(i)
    page.cropBox.setLowerLeft((42,115)) 
    page.cropBox.setUpperRight((500, 245)) 
    writer.addPage(page)

outstream = open('samples_cropped.pdf','wb')
writer.write(outstream)
outstream.close()

Run Code Online (Sandbox Code Playgroud)

预先感谢您的帮助

python opencv image-processing computer-vision pypdf2

Ste*_*eve

2019 07-18

5
推荐指数

1
解决办法

2824
查看次数