如何使用PDFbox将元数据添加到PDF文档?

Ant*_*ony 4 java pdf metadata pdfbox

我有一个可用的PDF文档输入流。我想将subject元数据添加到文档中,然后保存它。我不确定该怎么做。

我在这里遇到了一个示例食谱:https : //pdfbox.apache.org/1.8/cookbook/workingwithmetadata.html

但是,它仍然是模糊的。以下是我正在尝试的地方以及遇到问题的地方

PDDocument doc = PDDocument.load(myInputStream);
PDDocumentCatalog catalog = doc.getDocumentCatalog();
InputStream newXMPData = ...; //what goes here? How can I add subject tag?
PDMetadata newMetadata = new PDMetadata(doc, newXMLData, false );
catalog.setMetadata( newMetadata );
//does anything else need to happen to save the document??
//I would like an outputstream of the document (with metadata) so that I can save it to an S3 bucket
Run Code Online (Sandbox Code Playgroud)

Ari*_*use 6

以下代码设置了PDF文档的标题,但它也应该适用于其他属性:

public static byte[] insertTitlePdf(byte[] documentBytes, String title) {
    try {
        PDDocument document = PDDocument.load(documentBytes);
        PDDocumentInformation info = document.getDocumentInformation();
        info.setTitle(title);
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        document.save(baos);
        return baos.toByteArray();
    } catch (IOException e) {
        e.printStackTrace();
    }

    return null;
}
Run Code Online (Sandbox Code Playgroud)

需要Apache PDFBox,因此可以使用以下命令将其导入到Maven:

<dependency>
    <groupId>org.apache.pdfbox</groupId>
    <artifactId>pdfbox</artifactId>
    <version>2.0.6</version>
</dependency>
Run Code Online (Sandbox Code Playgroud)

添加标题:

byte[] documentBytesWithTitle = insertTitlePdf(documentBytes, "Some fancy title");
Run Code Online (Sandbox Code Playgroud)

使用(JSF示例)在浏览器中显示它:

<object class="pdf" data="data:application/pdf;base64,#{myBean.getDocumentBytesWithTitleAsBase64()}" type="application/pdf">Document could not be loaded</object>
Run Code Online (Sandbox Code Playgroud)

结果(Chrome):

PDF文件名称变更结果


小智 6

另一种更简单的方法是使用内置的文档信息对象:

PDDocument inputDoc = // your doc
inputDoc.getDocumentInformation().setCreator("Some meta");
inputDoc.getDocumentInformation().setCustomMetadataValue("fieldName", "fieldValue");
Run Code Online (Sandbox Code Playgroud)

这还具有不需要 xmpbox 库的好处。


Til*_*err 2

这个答案使用xmpbox,来自源代码下载中的AddMetadataFromDocInfo示例:

XMPMetadata xmp = XMPMetadata.createXMPMetadata();
DublinCoreSchema dc = xmp.createAndAddDublinCoreSchema();
dc.setDescription("descr");
XmpSerializer serializer = new XmpSerializer();
ByteArrayOutputStream baos = new ByteArrayOutputStream();
serializer.serialize(xmp, baos, true);
PDMetadata metadata = new PDMetadata(doc);
metadata.importXMPMetadata(baos.toByteArray());
doc.getDocumentCatalog().setMetadata(metadata);
Run Code Online (Sandbox Code Playgroud)