标签: pymupdf

如何解决“没有名为‘前端’的模块”错误消息？

我已经安装了 PymuPDF/fitz，因为我试图从 PDF 文件中提取图像。但是，在运行下面的代码时，我看到No module named 'frontend'.

    doc = fitz.open(pdf_path)
            for i in range(len(doc)):
                for img in doc.getPageImageList(i):
                    xref = img[0]
                    pix = fitz.Pixmap(doc, xref)
                    if pix.n < 5:  # this is GRAY or RGB
                        pix.writePNG("p%s-%s.png" % (i, xref))
                    else:  # CMYK: convert to RGB first
                        pix1 = fitz.Pixmap(fitz.csRGB, pix)
                        pix1.writePNG("p%s-%s.png" % (i, xref))
                        pix1 = None
                    pix = None

Run Code Online (Sandbox Code Playgroud)

我已经搜索过，但没有这种错误的单一报告。我已经安装了 PyMuPDF、muPDF 和 fitz 模块

这是完整的错误：

    Traceback (most recent call last):
      File "/home/waqar/PycharmProjects/predator/ExtractFileImage.py", line 1, in <module>
        import …

Run Code Online (Sandbox Code Playgroud)

python python-3.x mupdf pymupdf

Waq*_*qar

2020 08-31

22
推荐指数

5
解决办法

3万
查看次数

PyMuPDF：AttributeError：模块“fitz”没有属性“open”

pip3 install PyMuPDF

Collecting PyMuPDF Using cached PyMuPDF-1.18.17-cp37-cp37m-win_amd64.whl (5.4 MB)
Installing collected packages: PyMuPDF
Successfully installed PyMuPDF-1.18.17

Run Code Online (Sandbox Code Playgroud)

import fitz
doc = fitz.open("my_pdf.pdf")

Run Code Online (Sandbox Code Playgroud)

当我寻找def open当我在fitz.py，我什么也没找到。所以我理解错误但我不明白为什么我下载的文件没有此功能？有人可以分享一下好的文件吗？或者也许我错过了其他东西？

完整追踪：

runfile('D:/Documents/Python_projects/Point_and_area_pdf_to_excel/get_info.py', wdir='D:/Documents/Python_projects/Point_and_area_pdf_to_excel')
Reloaded modules: six, dateutil._common, dateutil.relativedelta, dateutil.tz._common, dateutil.tz._factories, dateutil.tz.win, dateutil.tz.tz, dateutil.tz, dateutil.parser._parser, dateutil.parser.isoparser, dateutil.parser, chardet.enums, chardet.charsetprober, chardet.charsetgroupprober, chardet.codingstatemachine, chardet.escsm, chardet.escprober, chardet.latin1prober, chardet.mbcssm, chardet.utf8prober, chardet.mbcharsetprober, chardet.euctwfreq, chardet.euckrfreq, chardet.gb2312freq, chardet.big5freq, chardet.jisfreq, chardet.chardistribution, chardet.jpcntx, chardet.sjisprober, chardet.eucjpprober, chardet.gb2312prober, chardet.euckrprober, chardet.cp949prober, chardet.big5prober, chardet.euctwprober, chardet.mbcsgroupprober, chardet.hebrewprober, chardet.sbcharsetprober, chardet.langbulgarianmodel, chardet.langgreekmodel, chardet.langhebrewmodel, chardet.langrussianmodel, chardet.langthaimodel, chardet.langturkishmodel, chardet.sbcsgroupprober, chardet.universaldetector, chardet.version, …

Run Code Online (Sandbox Code Playgroud)

python pymupdf

Uto*_*ion

2021 09-13

13
推荐指数

3
解决办法

2万
查看次数

使用 Python 从 PDF 中提取高分辨率图像

我已成功使用以下代码从多个 PDF 页面中提取图像，但分辨率相当低。有办法调整吗？

import fitz    
pdffile = "C:\\Users\\me\\Desktop\\myfile.pdf"
doc = fitz.open(pdffile)
for page_index in range(doc.pageCount):
    page = doc.loadPage(page_index)  
    pix = page.getPixmap()
    output = "image_page_" + str(page_index) + ".jpg"
    pix.writePNG(output)

Run Code Online (Sandbox Code Playgroud)

我还尝试使用此处的代码并将 if pix.n < 5" 更新为 "if pix.n - pix.alpha < 4 但这在我的情况下没有输出任何图像。

python pdf pymupdf

Ome*_*ega

2020 09-10

11
推荐指数

2
解决办法

9376
查看次数

在 MacOS Big Sur 上安装 PyMuPDF

我想在我的代码中导入 fitz 。为此，我尝试使用安装 PyMuPDF

pip3 install PyMuPDF

Run Code Online (Sandbox Code Playgroud)

但是，此安装失败并返回以下错误：

fitz/fitz_wrap.c:2754:10: fatal error: 'fitz.h' file not found
#include <fitz.h>
         ^~~~~~~~
1 error generated.
error: command '/opt/homebrew/clang' failed with exit code 1

Run Code Online (Sandbox Code Playgroud)

我还尝试通过 Homebrew 安装 mupdf 和 mupdf-tools 。他们都无法解决这个问题。如果您能帮助解决此安装错误，我将不胜感激！

clang python-3.x pymupdf macos-big-sur apple-m1

Nir*_*ali

2022 01-31

11
推荐指数

2
解决办法

7100
查看次数

PyMUPDF - 如何将 PDF 转换为图像，使用图像大小的原始文档设置并设置为 300dpi？

我目前正在考虑使用 python 包 PyMuPDF 来实现将 PDF 转换为图像的工作流程（在我的例子中，为 .TIFF 文件）。

我正在尝试模仿我当前用于 PDF -> 图像转换的另一个程序的行为。在该程序中，它允许您设置成像设置，如下所示：

图像输出质量 (DPI)：（默认为 300dpi）

基本图像尺寸：原始设置 - 使用原始文档设置渲染图像。

我的问题是，这在 PyMuPDF 中可能吗？如何将图像的输出 DPI 设置为 300 并将图像大小设置为原始文档大小？我对处理 PDF/图像的这种处理还很陌生，因此我们将不胜感激。

提前致谢，

pymupdf

ada*_*n11

lucky-day

10
推荐指数

1
解决办法

8450
查看次数

使用 pip 安装 paddleocr 时如何修复“PyMuPDF 中的错误”？

在做时pip install paddleocr，我在为 PyMuPDF 构建轮子时遇到错误。

Building wheels for collected packages: PyMuPDF\nBuilding wheel for PyMuPDF (setup.py) ... error\nerror: subprocess-exited-with-error\n\n  \xc3\x97 python setup.py bdist_wheel did not run successfully.\n  \xe2\x94\x82 exit code: 1\n  \xe2\x95\xb0\xe2\x94\x80> [70 lines of output]\n\n\n\nTraceback (most recent call last):\n        File "<string>", line 2, in <module>\n        File "<pip-setuptools-caller>", line 34, in <module>\n        File "C:\\Users\\3551\\AppData\\Local\\Temp\\pip-install-ip72hta1\\pymupdf_f7a2c6bc313a492fa6c66ad0817a4616\\setup.py", line 487, in <module>\n          mupdf_local = get_mupdf()\n                        ^^^^^^^^^^^\n        File "C:\\Users\\3551\\AppData\\Local\\Temp\\pip-install-ip72hta1\\pymupdf_f7a2c6bc313a492fa6c66ad0817a4616\\setup.py", line 450, in get_mupdf\n          return tar_extract( mupdf_tgz, exists='return')\n                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n        File "C:\\Users\\3551\\AppData\\Local\\Temp\\pip-install-ip72hta1\\pymupdf_f7a2c6bc313a492fa6c66ad0817a4616\\setup.py", line 183, in …

Run Code Online (Sandbox Code Playgroud)

pip pymupdf paddleocr

Jin*_*ore

2023 06-01

7
推荐指数

2
解决办法

1万
查看次数

PyMuPDF 提取纯文本的问题

我想使用PyMuPDF读取 PDF 文件。我需要的只是纯文本（不需要提取颜色、字体、表格等信息）。

我尝试过以下方法

import fitz
from fitz import TextPage
ifile = "C:\\user\\docs\\aPDFfile.pdf"
doc = TextPage(ifile)
>>> TypeError: in method 'new_TextPage', argument 1 of type 'struct fz_rect_s *'

Run Code Online (Sandbox Code Playgroud)

这不起作用，所以我尝试了

doc = fitz.Document(ifile)
t = TextPage.extractText(doc)
>>> AttributeError: 'Document' object has no attribute '_extractText'

Run Code Online (Sandbox Code Playgroud)

这又不起作用了。

然后我发现了PyMuPDF 的一位作者写的很棒的博客，其中包含按照从文件中读取的顺序提取文本的详细代码。但每次我用不同的 PDF 运行此代码时，我都会得到KeyError: 'lines'（代码中的第 81 行）或KeyError: "bbox"（代码中的第 60 行）。

我无法在这里发布 PDF，因为它们是机密的，我很高兴在这里提供有用的信息。但是有什么方法可以让我完成 PyMuPDF 要做的最简单的任务：从 PDF 中提取纯文本，无序或其他（我不太介意）？

python pdf pymupdf

PyR*_*red

2020 08-19

6
推荐指数

2
解决办法

2万
查看次数

卡米洛特 PDF 尺寸

在发布此内容之前，我已经广泛搜索了 stackoverflow，但未能在 Camelot 页面尺寸上找到任何内容。有这个问题，建议使用table_region，但这并不能解决OP或我的问题。不幸的是，我无法发表评论来跟进OP，看看他们是否找到了解决方案。

\n\n

我正在尝试做的事情：

\n\n

我正在使用 Camelot 来识别表（显然）。有时，当我知道页面的哪个区域可能包含感兴趣的表时，我只想在该区域中进行搜索。camelot.read_pdf()使用\可以轻松完成此操作table_region- 我只需要提供一对坐标供 Camelot 进行搜索。

\n\n

问题是，我使用 PyMuPDF 获取这些坐标，因此它们位于 PyMuPDF 的坐标系中。我已经弄清楚如何翻译这些坐标，但我缺少来自 Camelot 的一条关键信息 - 页面的尺寸。这些值很容易在 PyMuPDF（Page 类.bound()，我需要 Camelot 等效值。如果有人认为之间可能有替代方案，我可以在这里提供代数的进一步解释

\n\n

到目前为止我已经尝试过的：

\n\n

我阅读了文档。由于文档中的这一行，我想知道这是否可以提供一种获取尺寸的方法：“在使用 Lattice 时，可能会出现检测到较小的线 don\xe2\x80\x99t 的情况。最小线的大小检测到的结果是通过将 PDF 页面\xe2\x80\x99s 尺寸除以名为的缩放因子来计算的line_scale计算的。默认情况下，其值为 15"

\n\n

我对替代方案持开放态度，本质上我要么想检查页面的某个区域是否包含表格（PyMuPDF坐标系中描述的区域，对于pdf页面，尺寸通常为（612, 792），原点位于顶部左角。camelot 的原点位于左下角），或者页面上的任何表格位于给定区域（如果有意义的话）。

python python-camelot pymupdf

Jin*_*inx

lucky-day

6
推荐指数

1
解决办法

2697
查看次数

如何使用python中的fitz模块更改pdf中的突出显示颜色

您好，我正在尝试更改 pdf 中的突出显示颜色，但无法执行此操作。默认突出显示颜色是黄色，但我想更改它以下是我的代码：

    import fitz

    doc = fitz.open(r"path\input.pdf")

    page=doc[0]
    text="some text"
    text_instances = page.searchFor(text)


    for inst in text_instances:
        highlight = page.addHighlightAnnot(inst)
        highlight.setColors(colors='Red')
        highlight.update()


    doc.save(r"path\output.pdf")

Run Code Online (Sandbox Code Playgroud)

另外我如何一起搜索整个 pdf 而不仅仅是一页

以及如何突出显示 pdf 中给出的图像上的文本

python pymupdf

Gav*_*hta

2020 03-06

6
推荐指数

1
解决办法

7665
查看次数

无法在 alpine docker 镜像上安装 PyMuPDF

我正在尝试在 apline 图像上安装 pymupdf 包，但出现以下错误

fitz/fitz_wrap.c:2739:10: fatal error: ft2build.h: No such file or directory
     2739 | #include <ft2build.h>
          |          ^~~~~~~~~~~~
    compilation terminated.
    error: command 'gcc' failed with exit status 1

Run Code Online (Sandbox Code Playgroud)

 RUN pip install PyMuPDF
 ---> Running in 34d246d6f01b
Collecting PyMuPDF
  Downloading PyMuPDF-1.18.5.tar.gz (251 kB)
Using legacy 'setup.py install' for PyMuPDF, since package 'wheel' is not installed.
Installing collected packages: PyMuPDF
    Running setup.py install for PyMuPDF: started
    Running setup.py install for PyMuPDF: finished with status 'error'
    ERROR: Command errored out with …

Run Code Online (Sandbox Code Playgroud)

python docker alpine-linux pymupdf

Nit*_*yal

2020 12-21

6
推荐指数

2
解决办法

5249
查看次数