标签: pypdf

无法安装PyPdf 2模块

试图安装PyPdf2模块,我下载的zip并解压它,我执行python setup.py build和python setup.py install,但它似乎并没有被安装,当我试图从一个python脚本导入,则返回ImportError:

import pyPdf
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: No module named pyPdf

Run Code Online (Sandbox Code Playgroud)

请帮忙.

我在Windows XP下使用python 2.7.

python module pypdf importerror

geo*_*eek

2012 10-08

13
推荐指数

1
解决办法

2万
查看次数

PyPDF 2解密不起作用

目前我使用PyPDF 2作为依赖.

我遇到了一些加密文件并像往常一样处理它们(在下面的代码中):

    PDF = PdfFileReader(file(pdf_filepath, 'rb'))
    if PDF.isEncrypted:
        PDF.decrypt("")
        print PDF.getNumPages()

Run Code Online (Sandbox Code Playgroud)

我的文件路径看起来像"〜/ blah/FDJKL492019 21490,LFS.pdf"PDF.decrypt("")返回1,这意味着它成功了.但是当它命中打印PDF.getNumPages()时,它仍然会引发错误,"PyPDF2.utils.PdfReadError:文件尚未被解密".

我该如何摆脱这个错误？我可以通过双击打开PDF文件(使用Adobe Reader默认打开).

python pdf encryption pypdf

Jin*_*Lee

2014 10-08

13
推荐指数

3
解决办法

1万
查看次数

PDF - 删除白边

我想知道一种从PDF文件中删除白边距的方法.就像Adobe Acrobat X Pro一样.我知道它不适用于每个PDF文件.

我猜这样做的方法是获取文本边距,然后裁剪出边距.

PyPdf是首选.

iText根据以下代码查找文本边距:

public void addMarginRectangle(String src, String dest)
    throws IOException, DocumentException {
    PdfReader reader = new PdfReader(src);
    PdfReaderContentParser parser = new PdfReaderContentParser(reader);
    PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(RESULT));
    TextMarginFinder finder;
    for (int i = 1; i <= reader.getNumberOfPages(); i++) {
        finder = parser.processContent(i, new TextMarginFinder());
        PdfContentByte cb = stamper.getOverContent(i);
        cb.rectangle(finder.getLlx(), finder.getLly(),
            finder.getWidth(), finder.getHeight());
        cb.stroke();
    }
    stamper.close();
}

Run Code Online (Sandbox Code Playgroud)

pdf pdf-generation itext ghostscript pypdf

jac*_*des

2012 05-03

12
推荐指数

2
解决办法

1万
查看次数

使用Python生成拼合PDF

当我从任何源PDF中打印PDF时,文件大小会下降并删除表单中显示的文本框.简而言之,它会使文件变平.这是我想要实现的行为.

下面的代码使用另一个PDF作为源(我想要展平的那个)来创建PDF,它也会写入文本框形式.

我可以在没有文本框的情况下获得PDF,将其展平吗？就像Adobe在PDF上打印PDF一样.

我的其他代码看起来像这样减去一些东西:

import os
import StringIO
from pyPdf import PdfFileWriter, PdfFileReader
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import letter

directory = os.path.join(os.getcwd(), "source")  # dir we are interested in
fif = [f for f in os.listdir(directory) if f[-3:] == 'pdf'] # get the PDFs
for i in fif:
    packet = StringIO.StringIO()
    can = canvas.Canvas(packet, pagesize=letter)
    can.rotate(-90)
    can.save()

    packet.seek(0)
    new_pdf = PdfFileReader(packet)
    fname = os.path.join('source', i)
    existing_pdf = PdfFileReader(file(fname, "rb"))
    output = PdfFileWriter()
    nump = existing_pdf.getNumPages()
    page = existing_pdf.getPage(0) …

Run Code Online (Sandbox Code Playgroud)

python pdf-generation reportlab pypdf

Mak*_*nts

2015 11-24

12
推荐指数

2
解决办法

4268
查看次数

使用PDFMiner解析没有/ Root对象的PDF

我正在尝试使用PDFMiner python绑定从大量PDF中提取文本.我写的模块适用于许多PDF,但是对于一部分PDF,我得到了一些有些神秘的错误:

ipython堆栈跟踪:

/usr/lib/python2.7/dist-packages/pdfminer/pdfparser.pyc in set_parser(self, parser)
    331                 break
    332         else:
--> 333             raise PDFSyntaxError('No /Root object! - Is this really a PDF?')
    334         if self.catalog.get('Type') is not LITERAL_CATALOG:
    335             if STRICT:

PDFSyntaxError: No /Root object! - Is this really a PDF?

Run Code Online (Sandbox Code Playgroud)

当然,我立即检查这些PDF是否已损坏,但它们可以被正确读取.

尽管没有根对象,有没有办法阅读这些PDF？我不太确定从哪里开始.

非常感谢!

编辑:

我尝试使用PyPDF试图获得一些差异诊断.堆栈跟踪如下:

In [50]: pdf = pyPdf.PdfFileReader(file(fail, "rb"))
---------------------------------------------------------------------------
PdfReadError                              Traceback (most recent call last)
/home/louist/Desktop/pdfs/indir/<ipython-input-50-b7171105c81f> in <module>()
----> 1 pdf = pyPdf.PdfFileReader(file(fail, "rb"))

/usr/lib/pymodules/python2.7/pyPdf/pdf.pyc in __init__(self, stream)
    372         self.flattenedPages = None
    373         self.resolvedObjects = …

Run Code Online (Sandbox Code Playgroud)

python pdf-parsing pypdf pdf-manipulation

blz*_*blz

2012 07-14

11
推荐指数

2
解决办法

8922
查看次数

如何在ubuntu 15.04中安装poppler？

Poppler是一个基于xpdf-3.0代码库的PDF渲染库.我已经从官方网站http://poppler.freedesktop.org/下载了tar.xz文件但我不知道如何处理这个文件

是否有任何安装或运行命令？

PS - 我是linux的新手,所以我对它还不是很了解..

python ubuntu pygtk poppler pypdf

Ars*_*shi

2015 08-24

11
推荐指数

2
解决办法

1万
查看次数

用于IndirectObject提取的pyPdf

按照这个例子,我可以将所有元素列入pdf文件

import pyPdf
pdf = pyPdf.PdfFileReader(open("pdffile.pdf"))
list(pdf.pages) # Process all the objects.
print pdf.resolvedObjects

Run Code Online (Sandbox Code Playgroud)

现在,我需要从pdf文件中提取非标准对象.

我的对象是名为MYOBJECT的对象,它是一个字符串.

由关注我的python脚本打印的作品是:

{'/MYOBJECT': IndirectObject(584, 0)}

Run Code Online (Sandbox Code Playgroud)

pdf文件是这样的:

558 0 obj
<</Contents 583 0 R/CropBox[0 0 595.22 842]/MediaBox[0 0 595.22 842]/Parent 29 0 R/Resources
  <</ColorSpace <</CS0 563 0 R>>
    /ExtGState <</GS0 568 0 R>>
    /Font<</TT0 559 0 R/TT1 560 0 R/TT2 561 0 R/TT3 562 0 R>>
    /ProcSet[/PDF/Text/ImageC]
    /Properties<</MC0<</MYOBJECT 584 0 R>>/MC1<</SubKey 582 0 R>> >>
    /XObject<</Im0 578 0 R>>>>
  /Rotate 0/StructParents 0/Type/Page>>
endobj
...
...
... …

Run Code Online (Sandbox Code Playgroud)

python pdf stream pypdf

Gia*_*rlo

2012 08-21

10
推荐指数

2
解决办法

1万
查看次数

使用pypdf更改pdf文件的元数据

我想使用pypdf创建/修改pdf文档的标题.似乎标题是只读的.有没有办法访问此元数据r/w？

如果回答是肯定的,那么一段代码将不胜感激.

谢谢

pdf metadata pypdf

Bau*_*nes

2012 06-22

10
推荐指数

1
解决办法

7688
查看次数

请求:从url返回文件对象(与open('','rb')一样)

我想直接将文件下载到内存中requests,以便将其直接传递给PyPDF2阅读器,避免将其写入磁盘,但我无法弄清楚如何将其作为传递给它file object.这是我尝试过的:

import requests as req
from PyPDF2 import PdfFileReader

r_file = req.get('http://www.location.come/somefile.pdf')
rs_file = req.get('http://www.location.come/somefile.pdf', stream=True)

with open('/location/somefile.pdf', 'wb') as f:
    for chunk in r_file.iter_content():
        f.write(chunk)

local_file = open('/location/somefile.pdf', 'rb')

#Works:
pdf = PdfFileReader(local_file)

#As expected, these don't work:
pdf = PdfFileReader(rs_file)
pdf = PdfFileReader(r_file)
pdf = PdfFileReader(rs_file.content)
pdf = PdfFileReader(r_file.content)
pdf = PdfFileReader(rs_file.raw)
pdf = PdfFileReader(r_file.raw)

Run Code Online (Sandbox Code Playgroud)

python file download pypdf python-requests

Tim*_*imY

lucky-day

10
推荐指数

1
解决办法

5108
查看次数