PDFBox:如何"压扁"PDF格式？

Question

PDFBox:如何"压扁"PDF格式？

如何使用PDFBox"展平"PDF表单(删除表单字段但保留字段文本)？

快速执行此操作的方法是从acrofrom中删除字段.

为此,您只需要获取文档目录,然后获取acroform,然后从此acroform中删除所有字段.

图形表示与注释链接并保留在文档中.

所以我写了这段代码:

import java.io.File;
import java.util.ArrayList;
import java.util.List;

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDDocumentCatalog;
import org.apache.pdfbox.pdmodel.interactive.form.PDAcroForm;
import org.apache.pdfbox.pdmodel.interactive.form.PDField;

public class PdfBoxTest {
    public void test() throws Exception {
        PDDocument pdDoc = PDDocument.load(new File("E:\\Form-Test.pdf"));
        PDDocumentCatalog pdCatalog = pdDoc.getDocumentCatalog();
        PDAcroForm acroForm = pdCatalog.getAcroForm();

        if (acroForm == null) {
            System.out.println("No form-field --> stop");
            return;
        }

        @SuppressWarnings("unchecked")
        List<PDField> fields = acroForm.getFields();

        // set the text in the form-field <-- does work
        for (PDField field : fields) {
            if (field.getFullyQualifiedName().equals("formfield1")) {
                field.setValue("Test-String");
            }
        }

        // remove form-field but keep text ???
        // acroForm.getFields().clear();         <-- does not work
        // acroForm.setFields(null);             <-- does not work
        // acroForm.setFields(new ArrayList());  <-- does not work
        // ???

        pdDoc.save("E:\\Form-Test-Result.pdf");
        pdDoc.close();
    }
}

Run Code Online (Sandbox Code Playgroud)

Answer 1

Syl*_*gat 16

使用PDFBox 2,现在可以通过调用对象flatten上的方法轻松地"展平"PDF表单PDAcroForm.请参阅Javadoc:PDAcroForm.flatten().

使用此方法调用示例的简化代码:

//Load the document
PDDocument pDDocument = PDDocument.load(new File("E:\\Form-Test.pdf"));    
PDAcroForm pDAcroForm = pDDocument.getDocumentCatalog().getAcroForm();

//Fill the document
...

//Flatten the document
pDAcroForm.flatten();

//Save the document
pDDocument.save("E:\\Form-Test-Result.pdf");
pDDocument.close();

Run Code Online (Sandbox Code Playgroud)

注意:动态XFA表单不能展平.

要从PDFBox 1.*迁移到2.0,请查看官方迁移指南.

尼斯-> pDAcroForm.flatten（）; 使用org.apache.pdfbox 2.0.4 (2认同)

Answer 2

小智 7

setReadOnly为我工作,如下所示 -

   @SuppressWarnings("unchecked")
    List<PDField> fields = acroForm.getFields();
    for (PDField field : fields) {
        if (field.getFullyQualifiedName().equals("formfield1")) {
            field.setReadOnly(true);
        }
    }

Run Code Online (Sandbox Code Playgroud)

Answer 3

bfj*_*les 7

这肯定是有效的 - 我遇到了这个问题,整夜调试,但终于弄明白了怎么做:)

这是假设你有能力来编辑以某种方式PDF /有过的PDF一些控制.

首先,使用Acrobat Pro编辑表单.将它们隐藏为只读.

然后你需要使用两个库:PDFBox和PDFClown.

PDFBox删除了告诉Adobe Reader它是一个表单的东西; PDFClown删除实际字段.必须首先完成PDFClown,然后是PDFBox(按顺序完成.反过来说不起作用).

单字段示例代码:

// PDF Clown code
File file = new File("Some file path"); 
Document document = file.getDocument();
Form form = file.getDocument.getForm();
Fields fields = form.getFields();
Field field = fields.get("some_field_name");

PageStamper stamper = new PageStamper(); 
FieldWidgets widgets = field.getWidgets();
Widget widget = widgets.get(0); // Generally is 0.. experiment to figure out
stamper.setPage(widget.getPage());

// Write text using text form field position as pivot.
PrimitiveComposer composer = stamper.getForeground();
Font font = font.get(document, "some_path"); 
composer.setFont(font, 10); 
double xCoordinate = widget.getBox().getX();
double yCoordinate = widget.getBox().getY(); 
composer.showText("text i want to display", new Point2D.Double(xCoordinate, yCoordinate)); 

// Actually delete the form field!
field.delete();
stamper.flush(); 

// Create new buffer to output to... 
Buffer buffer = new Buffer();
file.save(buffer, SerializationModeEnum.Standard); 
byte[] bytes = buffer.toByteArray(); 

// PDFBox code
InputStream pdfInput = new ByteArrayInputStream(bytes);
PDDocument pdfDocument = PDDocument.load(pdfInput);

// Tell Adobe we don't have forms anymore.
PDDocumentCatalog pdCatalog = pdfDocument.getDocumentCatalog();
PDAcroForm acroForm = pdCatalog.getAcroForm();
COSDictionary acroFormDict = acroForm.getDictionary();
COSArray cosFields = (COSArray) acroFormDict.getDictionaryObject("Fields");
cosFields.clear();

// Phew. Finally.
pdfDocument.save("Some file path");

Run Code Online (Sandbox Code Playgroud)

可能在这里和那里有一些错别字,但这应该足以得到要点:)

Answer 4

小智 5

阅读有关pdf参考指南的内容后，我发现您可以通过添加值为1的“ Ff”键（字段标志）来轻松设置AcroForm字段的只读模式。这是有关此文档的内容：

如果设置，则用户不得更改该字段的值。任何关联的窗口小部件注释都不会与用户交互；也就是说，它们将不会响应鼠标单击或响应鼠标动作而更改其外观。此标志对于其值是从数据库计算或导入的字段很有用。

因此代码看起来像这样（使用pdfbox lib）：

 public static void makeAllWidgetsReadOnly(PDDocument pdDoc) throws IOException {

    PDDocumentCatalog catalog = pdDoc.getDocumentCatalog();

    PDAcroForm form = catalog.getAcroForm();

    List<PDField> acroFormFields = form.getFields();

    System.out.println(String.format("found %d acroFrom fields", acroFormFields.size()));

    for(PDField field: acroFormFields) {
        makeAcroFieldReadOnly(field);
    }
}

private static void makeAcroFieldReadOnly(PDField field) {

    field.getDictionary().setInt("Ff",1);

}

Run Code Online (Sandbox Code Playgroud)

Answer 5

Luk*_*kas 0

这是来自 PDFBox-Mailinglist 的 Thomas 的回答：

您需要通过 COSDictionary 获取字段。试试这个代码...

PDDocument pdDoc = PDDocument.load(new File("E:\\Form-Test.pdf"));
PDDocumentCatalog pdCatalog = pdDoc.getDocumentCatalog();
PDAcroForm acroForm = pdCatalog.getAcroForm();

COSDictionary acroFormDict = acroForm.getDictionary();
COSArray fields = acroFormDict.getDictionaryObject("Fields");
fields.clear();

Run Code Online (Sandbox Code Playgroud)

归档时间：	13 年前
查看次数：	23823 次
最近记录：	7 年前