如何使用PDFBox 2.0查找和替换PDF文档中的文本,他们提取旧的示例,它的语法不再有效,所以我想知道它是否仍然可行,如果是这样,最好的方法是什么.谢谢!
小智 6
你可以尝试这样:
public static PDDocument replaceText(PDDocument document, String searchString, String replacement) throws IOException {
if (Strings.isEmpty(searchString) || Strings.isEmpty(replacement)) {
return document;
}
PDPageTree pages = document.getDocumentCatalog().getPages();
for (PDPage page : pages) {
PDFStreamParser parser = new PDFStreamParser(page);
parser.parse();
List tokens = parser.getTokens();
for (int j = 0; j < tokens.size(); j++) {
Object next = tokens.get(j);
if (next instanceof Operator) {
Operator op = (Operator) next;
//Tj and TJ are the two operators that display strings in a PDF
if (op.getName().equals("Tj")) {
// Tj takes one operator and that is the string to display so lets update that operator
COSString previous = (COSString) tokens.get(j - 1);
String string = previous.getString();
string = string.replaceFirst(searchString, replacement);
previous.setValue(string.getBytes());
} else if (op.getName().equals("TJ")) {
COSArray previous = (COSArray) tokens.get(j - 1);
for (int k = 0; k < previous.size(); k++) {
Object arrElement = previous.getObject(k);
if (arrElement instanceof COSString) {
COSString cosString = (COSString) arrElement;
String string = cosString.getString();
string = StringUtils.replaceOnce(string, searchString, replacement);
cosString.setValue(string.getBytes());
}
}
}
}
}
// now that the tokens are updated we will replace the page content stream.
PDStream updatedStream = new PDStream(document);
OutputStream out = updatedStream.createOutputStream();
ContentStreamWriter tokenWriter = new ContentStreamWriter(out);
tokenWriter.writeTokens(tokens);
page.setContents(updatedStream);
out.close();
}
return document;
}
Run Code Online (Sandbox Code Playgroud)
小智 5
我花了很多时间想出一个解决方案,最终获得了 Acrobat DC 订阅,这样我就可以创建字段作为要替换的文本的占位符。在我的例子中,这些字段用于客户信息和订单详细信息,因此不是非常复杂的数据,但文档中充满了业务相关条件的页面,并且布局非常复杂。
然后我就简单的做了这个,可能适合你。
private void update() throws InvalidPasswordException, IOException {
Map<String, String> map = new HashMap<>();
map.put("fieldname", "value to update");
File template = new File("template.pdf");
PDDocument document = PDDocument.load(template);
List<PDField> fields = document.getDocumentCatalog().getAcroForm().getFields();
for (PDField field : fields) {
for (Map.Entry<String, String> entry : map.entrySet()) {
if (entry.getKey().equals(field.getFullyQualifiedName())) {
field.setValue(entry.getValue());
field.setReadOnly(true);
}
}
}
File out = new File("out.pdf");
document.save(out);
document.close();
}
Run Code Online (Sandbox Code Playgroud)
青年MMV