python 中texttract找不到pdf文件

Question

我有以下简单的代码：

import textract

text = textract.process("text.pdf")

但是，我收到以下错误：

FileNotFoundError: [WinError 2] The system cannot find the file specified

但我确信我text.pdf在当前目录中有一个名为的文件。尽管如此，如果我创建一个名为的文档a.txt并将代码的第二行更改为：

text = textract.process("a.txt", extension='txt')

然后问题就消失了。我也尝试过：

text = textract.process("text.pdf", extension='pdf')

但我遇到了和以前一样的错误。

预先感谢您的帮助。

Answer 1

遇到了同样的问题，并通过另外安装 pdftotext 解决了它。

conda install -c conda-forge pdftotext

如果 pdftotext 丢失，文本将使用似乎有问题的回退。