小编HB1*_*123的帖子

Python PDFMIner - PDF到CSV

我希望能够将PDF转换为CSV文件,并找到了几个有用的脚本,但是对Python来说,我有一个问题:

在哪里指定PDF的文件路径和要打印的CSV?

我正在使用Python 2.7.11和PDFMiner 20140328.

import sys
from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter
from pdfminer.pdfpage import PDFPage
from pdfminer.converter import XMLConverter, HTMLConverter, TextConverter
from pdfminer.layout import LAParams
from cStringIO import StringIO

def pdfparser(data):

    fp = file(data, 'rb')
    rsrcmgr = PDFResourceManager()
    retstr = StringIO()
    codec = 'utf-8'
    laparams = LAParams()
    device = TextConverter(rsrcmgr, retstr, codec=codec, laparams=laparams)
    interpreter = PDFPageInterpreter(rsrcmgr, device)


    for page in PDFPage.get_pages(fp):
    interpreter.process_page(page)
    data =  retstr.getvalue()

    print data

if __name__ == '__main__':
pdfparser(sys.argv[1]) 
Run Code Online (Sandbox Code Playgroud)

python csv pdf pdfminer

5
推荐指数
1
解决办法
1万
查看次数

标签 统计

csv ×1

pdf ×1

pdfminer ×1

python ×1