将文档转换为pdf格式的有效方法

Aam*_*nan 20 python pdf ubuntu document-conversion docsplit

我一直在努力找到将文档转换为doc,docx,ppt,pptx到pdf的有效方法.到目前为止,我已经试过docsplitoowriter,但都采取> 10秒完成任务的pptx文件有大小1.7MB.任何人都可以建议我改进方法的更好方法或建议吗?

我尝试过的:

from subprocess import Popen, PIPE
import time

def convert(src, dst):
    d = {'src': src, 'dst': dst}
    commands = [
        '/usr/bin/docsplit pdf --output %(dst)s %(src)s' % d,
        'oowriter --headless -convert-to pdf:writer_pdf_Export %(dst)s %(src)s' % d,
    ]

    for i in range(len(commands)):
        command = commands[i]
        st = time.time()
        process = Popen(command, stdout=PIPE, stderr=PIPE, shell=True) # I am aware of consequences of using `shell=True` 
        out, err = process.communicate()
        errcode = process.returncode
        if errcode != 0:
            raise Exception(err)
        en = time.time() - st
        print 'Command %s: Completed in %s seconds' % (str(i+1), str(round(en, 2)))

if __name__ == '__main__':
    src = '/path/to/source/file/'
    dst = '/path/to/destination/folder/'
    convert(src, dst)
Run Code Online (Sandbox Code Playgroud)

输出:

Command 1: Completed in 11.91 seconds
Command 2: Completed in 11.55 seconds
Run Code Online (Sandbox Code Playgroud)

环境:

  • Linux - Ubuntu 12.04
  • Python 2.7.3

更多工具结果:

ave*_*net 18

尝试从你的Python代码调用unoconv,在我的本地机器上花了8秒钟,我不知道它是否足够快你:

time unoconv 15.\ Text-Files.pptx
real    0m8.604s
Run Code Online (Sandbox Code Playgroud)