在python中将pdf转换为docx格式

Question

在python中将pdf转换为docx格式

Jay*_*uks 2 pdf docx python-docx pdfminer

请问如何将pdf转换为docx。我尝试使用 pdfminer 转换为 html 来提取文本，但看起来仍然不够好。

Answer 1

thr*_*dhn 5

pdf2docx

安装pdf2docx包点击这里

安装

克隆或下载 pdf2docx

 pip install pdf2docx
     or
 # download the package and install your environment
 python setup.py install

Run Code Online (Sandbox Code Playgroud)

选项1

from pdf2docx import Converter

pdf_file  = r'C:\Users\ABCD\Desktop\XYZ/Document1.pdf'# source file 
docx_file = r'C:\Users\ABCD\Desktop\XYZ/sample.docx'  # destination file

# convert pdf to docx
cv = Converter(pdf_file)
cv.convert(docx_file, start=0, end=None)
cv.close()

#Output

Parsing Page 53: 53/53...
Creating Page 53: 53/53...
--------------------------------------------------
Terminated in 6.258919400000195s.

Run Code Online (Sandbox Code Playgroud)

选项2

from pdf2docx import parse

pdf_file  = r'C:\Users\ABCD\Desktop\XYZ/Document2.pdf' # source file
docx_file = r'C:\Users\ABCD\Desktop\XYZ/sample_2.docx' # destination file

# convert pdf to docx
parse(pdf_file, docx_file, start=0, end=None)

# output
Parsing Page 53: 53/53...
Creating Page 53: 53/53...
--------------------------------------------------
Terminated in 5.883666100000482s.

Run Code Online (Sandbox Code Playgroud)

归档时间：	6 年，2 月前
查看次数：	1135 次
最近记录：	3 年，1 月前