I want to extract text from pdf file using Python and PYPDF package. This is my pdf fie and this is my code:
import PyPDF2
opened_pdf = PyPDF2.PdfFileReader('test.pdf', 'rb')
p=opened_pdf.getPage(0)
p_text= p.extractText()
# extract data line by line
P_lines=p_text.splitlines()
print P_lines
Run Code Online (Sandbox Code Playgroud)
My problem is P_lines cannot extract data line by line and results in one giant string. I want to extract text line by line to analyze it. Any suggestion on how to improve it? Thanks! This is the string …