小编Ami*_*mir的帖子

Extracting text from pdf using Python and Pypdf2

I want to extract text from pdf file using Python and PYPDF package. This is my pdf fie and this is my code:

import PyPDF2
opened_pdf = PyPDF2.PdfFileReader('test.pdf', 'rb')

p=opened_pdf.getPage(0)

p_text= p.extractText()
# extract data line by line
P_lines=p_text.splitlines()
print P_lines
Run Code Online (Sandbox Code Playgroud)

My problem is P_lines cannot extract data line by line and results in one giant string. I want to extract text line by line to analyze it. Any suggestion on how to improve it? Thanks! This is the string …

python pdf text pypdf

7
推荐指数
1
解决办法
2万
查看次数

标签 统计

pdf ×1

pypdf ×1

python ×1

text ×1