我一直在努力找到将文档转换为doc,docx,ppt,pptx到pdf的有效方法.到目前为止,我已经试过docsplit和oowriter,但都采取> 10秒完成任务的pptx文件有大小1.7MB.任何人都可以建议我改进方法的更好方法或建议吗?
我尝试过的:
from subprocess import Popen, PIPE
import time
def convert(src, dst):
d = {'src': src, 'dst': dst}
commands = [
'/usr/bin/docsplit pdf --output %(dst)s %(src)s' % d,
'oowriter --headless -convert-to pdf:writer_pdf_Export %(dst)s %(src)s' % d,
]
for i in range(len(commands)):
command = commands[i]
st = time.time()
process = Popen(command, stdout=PIPE, stderr=PIPE, shell=True) # I am aware of consequences of using `shell=True`
out, err = process.communicate()
errcode = process.returncode
if errcode != 0: …Run Code Online (Sandbox Code Playgroud) 在我们的实验室中,我们具有具有以下特征的NVIDIA Tesla K80 GPU加速器计算:Intel(R) Xeon(R) CPU E5-2670 v3 @2.30GHz, 48 CPU processors, 128GB RAM, 12 CPU cores在Linux 64位下运行。
我正在运行以下代码,该代码GridSearchCV在将不同的数据帧集垂直追加到单个RandomForestRegressor模型系列中之后执行。我正在考虑的两个样本数据集可在此链接中找到
import sys
import imp
import glob
import os
import pandas as pd
import math
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
import matplotlib
from sklearn.model_selection import cross_val_score
import matplotlib.pyplot as plt
import numpy as np
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection …Run Code Online (Sandbox Code Playgroud)