11 java pdf ms-word apache-poi
我试图将docx包含表和图像的pdf文件转换为格式文件.
我一直在寻找,但没有得到适当的解决方案,要求提供正确和正确的解决方案:
在这里我尝试过:
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import org.apache.poi.xwpf.converter.pdf.PdfConverter;
import org.apache.poi.xwpf.converter.pdf.PdfOptions;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
public class TestCon {
public static void main(String[] args) {
TestCon cwoWord = new TestCon();
System.out.println("Start");
cwoWord.ConvertToPDF("D:\\Test.docx", "D:\\Test1.pdf");
}
public void ConvertToPDF(String docPath, String pdfPath) {
try {
InputStream doc = new FileInputStream(new File(docPath));
XWPFDocument document = new XWPFDocument(doc);
PdfOptions options = PdfOptions.create();
OutputStream out = new FileOutputStream(new File(pdfPath));
PdfConverter.getInstance().convert(document, out, options);
System.out.println("Done");
} catch (FileNotFoundException ex) {
System.out.println(ex.getMessage());
} catch (IOException ex) {
System.out.println(ex.getMessage());
}
}
}
Run Code Online (Sandbox Code Playgroud)
例外:
Exception in thread "main" java.lang.IllegalAccessError: tried to access method org.apache.poi.util.POILogger.log(ILjava/lang/Object;)V from class org.apache.poi.openxml4j.opc.PackageRelationshipCollection
at org.apache.poi.openxml4j.opc.PackageRelationshipCollection.parseRelationshipsPart(PackageRelationshipCollection.java:313)
at org.apache.poi.openxml4j.opc.PackageRelationshipCollection.<init>(PackageRelationshipCollection.java:162)
at org.apache.poi.openxml4j.opc.PackageRelationshipCollection.<init>(PackageRelationshipCollection.java:130)
at org.apache.poi.openxml4j.opc.PackagePart.loadRelationships(PackagePart.java:559)
at org.apache.poi.openxml4j.opc.PackagePart.<init>(PackagePart.java:112)
at org.apache.poi.openxml4j.opc.PackagePart.<init>(PackagePart.java:83)
at org.apache.poi.openxml4j.opc.PackagePart.<init>(PackagePart.java:128)
at org.apache.poi.openxml4j.opc.ZipPackagePart.<init>(ZipPackagePart.java:78)
at org.apache.poi.openxml4j.opc.ZipPackage.getPartsImpl(ZipPackage.java:239)
at org.apache.poi.openxml4j.opc.OPCPackage.getParts(OPCPackage.java:665)
at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:274)
at org.apache.poi.util.PackageHelper.open(PackageHelper.java:39)
at org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:121)
at test.TestCon.ConvertToPDF(TestCon.java:31)
at test.TestCon.main(TestCon.java:25)
Run Code Online (Sandbox Code Playgroud)
我的要求是创建一个java代码,将现有的docx转换为具有适当格式和对齐的pdf.
请建议.
使用的罐子:
小智 19
除了VivekRatanSinha的答案,我还想为将来需要它的人发布完整的代码和必需的罐子.
码:
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import org.apache.poi.xwpf.converter.pdf.PdfConverter;
import org.apache.poi.xwpf.converter.pdf.PdfOptions;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
public class WordConvertPDF {
public static void main(String[] args) {
WordConvertPDF cwoWord = new WordConvertPDF();
cwoWord.ConvertToPDF("D:/Test.docx", "D:/Test.pdf");
}
public void ConvertToPDF(String docPath, String pdfPath) {
try {
InputStream doc = new FileInputStream(new File(docPath));
XWPFDocument document = new XWPFDocument(doc);
PdfOptions options = PdfOptions.create();
OutputStream out = new FileOutputStream(new File(pdfPath));
PdfConverter.getInstance().convert(document, out, options);
} catch (IOException ex) {
System.out.println(ex.getMessage());
}
}
}
Run Code Online (Sandbox Code Playgroud)
和JARS:
请享用 :)
Viv*_*nha 14
你错过了一些图书馆.
我可以通过添加以下库来运行您的代码:
Apache POI 3.15
org.apache.poi.xwpf.converter.core-1.0.6.jar
org.apache.poi.xwpf.converter.pdf-1.0.6.jar
fr.opensagres.xdocreport.itext.extension-2.0.0.jar
itext-2.1.7.jar
ooxml-schemas-1.3.jar
我已成功转换了6页长的Word文档(.docx),其中包含表格,图像和各种格式.
小智 11
我将提供 3 种将 docx 转换为 pdf 的方法:
代码 :
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import fr.opensagres.poi.xwpf.converter.pdf.PdfOptions;
import fr.opensagres.poi.xwpf.converter.pdf.PdfConverter;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
public class ConvertDocToPdfitext {
public static void main(String[] args) {
System.out.println( "Starting conversion!!!" );
ConvertDocToPdfitext cwoWord = new ConvertDocToPdfitext();
cwoWord.ConvertToPDF("C:/Users/avijit.shaw/Desktop/testing/docx/Account Opening Prototype Details.docx", "C:/Users/avijit.shaw/Desktop/testing/docx/Test-1.pdf");
System.out.println( "Ending conversion!!!" );
}
public void ConvertToPDF(String docPath, String pdfPath) {
try {
InputStream doc = new FileInputStream(new File(docPath));
XWPFDocument document = new XWPFDocument(doc);
PdfOptions options = PdfOptions.create();
OutputStream out = new FileOutputStream(new File(pdfPath));
PdfConverter.getInstance().convert(document, out, options);
} catch (IOException ex) {
System.out.println(ex.getMessage());
}
}
}
Run Code Online (Sandbox Code Playgroud)
依赖:使用Maven解决依赖。
fr.opensagres.poi.xwpf.converter.core 的新版本 2.0.2 与 apache poi 4.0.1 和 itext 2.17 一起运行。您只需要在 Maven 中添加以下依赖项,然后 Maven 将自动下载所有依赖项。(更新了您的 Maven 项目,因此它下载了所有这些库及其所有依赖项)
<dependency>
<groupId>fr.opensagres.xdocreport</groupId>
<artifactId>fr.opensagres.poi.xwpf.converter.pdf</artifactId>
<version>2.0.2</version>
</dependency>
Run Code Online (Sandbox Code Playgroud)
注意:您需要在运行此代码的机器上安装 MS Office。
代码 :
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.io.OutputStream;
import com.documents4j.api.DocumentType;
import com.documents4j.api.IConverter;
import com.documents4j.job.LocalConverter;
public class Document4jApp {
public static void main(String[] args) {
File inputWord = new File("C:/Users/avijit.shaw/Desktop/testing/docx/Account Opening Prototype Details.docx");
File outputFile = new File("Test_out.pdf");
try {
InputStream docxInputStream = new FileInputStream(inputWord);
OutputStream outputStream = new FileOutputStream(outputFile);
IConverter converter = LocalConverter.builder().build();
converter.convert(docxInputStream).as(DocumentType.DOCX).to(outputStream).as(DocumentType.PDF).execute();
outputStream.close();
System.out.println("success");
} catch (Exception e) {
e.printStackTrace();
}
}
}
Run Code Online (Sandbox Code Playgroud)
依赖项:使用 Maven 来解决依赖项。
<dependency>
<groupId>com.documents4j</groupId>
<artifactId>documents4j-local</artifactId>
<version>1.0.3</version>
</dependency>
<dependency>
<groupId>com.documents4j</groupId>
<artifactId>documents4j-transformer-msoffice-word</artifactId>
<version>1.0.3</version>
</dependency>
Run Code Online (Sandbox Code Playgroud)
注意:您需要在运行此代码的机器上安装 OpenOffice。 代码 :
import java.io.File;
import com.sun.star.beans.PropertyValue;
import com.sun.star.comp.helper.BootstrapException;
import com.sun.star.frame.XComponentLoader;
import com.sun.star.frame.XDesktop;
import com.sun.star.frame.XStorable;
import com.sun.star.lang.XComponent;
import com.sun.star.lang.XMultiComponentFactory;
import com.sun.star.uno.Exception;
import com.sun.star.uno.UnoRuntime;
import com.sun.star.uno.XComponentContext;
import ooo.connector.BootstrapSocketConnector;
public class App {
public static void main(String[] args) throws Exception, BootstrapException {
System.out.println("Stating conversion!!!");
// Initialise
String oooExeFolder = "C:\\Program Files (x86)\\OpenOffice 4\\program"; //Provide path on which OpenOffice is installed
XComponentContext xContext = BootstrapSocketConnector.bootstrap(oooExeFolder);
XMultiComponentFactory xMCF = xContext.getServiceManager();
Object oDesktop = xMCF.createInstanceWithContext("com.sun.star.frame.Desktop", xContext);
XDesktop xDesktop = (XDesktop) UnoRuntime.queryInterface(XDesktop.class, oDesktop);
// Load the Document
String workingDir = "C:/Users/avijit.shaw/Desktop/testing/docx/"; //Provide directory path of docx file to be converted
String myTemplate = workingDir + "Account Opening Prototype Details.docx"; // Name of docx file to be converted
if (!new File(myTemplate).canRead()) {
throw new RuntimeException("Cannot load template:" + new File(myTemplate));
}
XComponentLoader xCompLoader = (XComponentLoader) UnoRuntime
.queryInterface(com.sun.star.frame.XComponentLoader.class, xDesktop);
String sUrl = "file:///" + myTemplate;
PropertyValue[] propertyValues = new PropertyValue[0];
propertyValues = new PropertyValue[1];
propertyValues[0] = new PropertyValue();
propertyValues[0].Name = "Hidden";
propertyValues[0].Value = new Boolean(true);
XComponent xComp = xCompLoader.loadComponentFromURL(sUrl, "_blank", 0, propertyValues);
// save as a PDF
XStorable xStorable = (XStorable) UnoRuntime.queryInterface(XStorable.class, xComp);
propertyValues = new PropertyValue[2];
// Setting the flag for overwriting
propertyValues[0] = new PropertyValue();
propertyValues[0].Name = "Overwrite";
propertyValues[0].Value = new Boolean(true);
// Setting the filter name
propertyValues[1] = new PropertyValue();
propertyValues[1].Name = "FilterName";
propertyValues[1].Value = "writer_pdf_Export";
// Appending the favoured extension to the origin document name
String myResult = workingDir + "letterOutput.pdf"; // Name of pdf file to be output
xStorable.storeToURL("file:///" + myResult, propertyValues);
System.out.println("Saved " + myResult);
// shutdown
xDesktop.terminate();
}
}
Run Code Online (Sandbox Code Playgroud)
依赖项:使用 Maven 来解决依赖项。
<!-- https://mvnrepository.com/artifact/org.openoffice/unoil -->
<dependency>
<groupId>org.openoffice</groupId>
<artifactId>unoil</artifactId>
<version>3.2.1</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.openoffice/juh -->
<dependency>
<groupId>org.openoffice</groupId>
<artifactId>juh</artifactId>
<version>3.2.1</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.openoffice/bootstrap-connector -->
<dependency>
<groupId>org.openoffice</groupId>
<artifactId>bootstrap-connector</artifactId>
<version>0.1.1</version>
</dependency>
Run Code Online (Sandbox Code Playgroud)
我做了很多研究,发现 Documents4j 是将 docx 转换为 pdf 的最佳免费 API。对齐,字体, Documents4j 的一切都做得很好。
Maven 依赖项:
<dependency>
<groupId>com.documents4j</groupId>
<artifactId>documents4j-local</artifactId>
<version>1.0.3</version>
</dependency>
<dependency>
<groupId>com.documents4j</groupId>
<artifactId>documents4j-transformer-msoffice-word</artifactId>
<version>1.0.3</version>
</dependency>
Run Code Online (Sandbox Code Playgroud)
使用以下代码将 docx 转换为 pdf。
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.io.OutputStream;
import com.documents4j.api.DocumentType;
import com.documents4j.api.IConverter;
import com.documents4j.job.LocalConverter;
public class Document4jApp {
public static void main(String[] args) {
File inputWord = new File("Tests.docx");
File outputFile = new File("Test_out.pdf");
try {
InputStream docxInputStream = new FileInputStream(inputWord);
OutputStream outputStream = new FileOutputStream(outputFile);
IConverter converter = LocalConverter.builder().build();
converter.convert(docxInputStream).as(DocumentType.DOCX).to(outputStream).as(DocumentType.PDF).execute();
outputStream.close();
System.out.println("success");
} catch (Exception e) {
e.printStackTrace();
}
}
}
Run Code Online (Sandbox Code Playgroud)
如果您的文档非常丰富,并且您的选择是在 Linux/Unix 上进行转换,那么线程中建议的所有三个主要选项可能“有点”难以实现。
我可能建议的一个解决方案是使用Gotenberg:一个 Docker 驱动的无状态 API,用于将 HTML、Markdown 和 Office 文档转换为 PDF。
$ docker run --rm -p 3000:3000 thecodingmachine/gotenberg:6curl:$ curl --request POST \
--url http://localhost:3000/convert/office \
--header 'Content-Type: multipart/form-data' \
--form files=@document.docx \
--form files=@document2.docx \
-o result.pdf
Run Code Online (Sandbox Code Playgroud)
将它部署到您的基础设施(例如作为单独的微服务)并从您的 Java 服务中点击它,发出简单的 HTTP 请求。在响应中获取您的 PDF 文件并使用它做您想做的事情。
经测试,效果很好!