实际上,我试图将pdf文件标记为一个句子,首先我使用pypdf2但面临数据丢失和格式不正确的问题。所以我尝试使用 ocr 但在将 pdf 转换为图像时我面临 poppler 问题谁能帮我解决这个问题
pages = convert_from_path(PDF_file, 600)
Run Code Online (Sandbox Code Playgroud)
FileNotFoundError Traceback (most recent call last)
~\Anaconda3\lib\site-packages\pdf2image\pdf2image.py in _page_count(pdf_path, userpw, poppler_path)
239 env["LD_LIBRARY_PATH"] = poppler_path + ":" + env.get("LD_LIBRARY_PATH", "")
--> 240 proc = Popen(command, env=env, stdout=PIPE, stderr=PIPE)
241
~\Anaconda3\lib\subprocess.py in __init__(self, args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell, cwd, env, universal_newlines, startupinfo, creationflags, restore_signals, start_new_session, pass_fds, encoding, errors, text)
774 errread, errwrite,
--> 775 restore_signals, start_new_session)
776 except:
~\Anaconda3\lib\subprocess.py in _execute_child(self, args, executable, preexec_fn, close_fds, pass_fds, cwd, env, startupinfo, creationflags, shell, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite, unused_restore_signals, unused_start_new_session)
1177 os.fspath(cwd) if cwd is not None else None,
-> 1178 startupinfo)
1179 finally:
FileNotFoundError: [WinError 2] The system cannot find the file specified
Run Code Online (Sandbox Code Playgroud)
在处理上述异常的过程中,又发生了一个异常:
PDFInfoNotInstalledError Traceback (most recent call last)
<ipython-input-15-3c78fc8271dd> in <module>
----> 1 pages = convert_from_path(PDF_file, 600)
~\Anaconda3\lib\site-packages\pdf2image\pdf2image.py in convert_from_path(pdf_path, dpi, output_folder, first_page, last_page, fmt, thread_count, userpw, use_cropbox, strict, transparent, single_file, output_file, poppler_path)
52 """
53
---> 54 page_count = _page_count(pdf_path, userpw, poppler_path=poppler_path)
55
56 # We start by getting the output format, the buffer processing function and if we need pdftocairo
~\Anaconda3\lib\site-packages\pdf2image\pdf2image.py in _page_count(pdf_path, userpw, poppler_path)
242 out, err = proc.communicate()
243 except:
--> 244 raise PDFInfoNotInstalledError('Unable to get page count. Is poppler installed and in PATH?')
245
246 try:
PDFInfoNotInstalledError: Unable to get page count.
Run Code Online (Sandbox Code Playgroud)
poppler 是否已安装并在 PATH 中?
| 归档时间: |
|
| 查看次数: |
8049 次 |
| 最近记录: |