标签: pdfbox

我正在尝试使用PDFMergerUtility合并PDF文档.我正在考虑mergeDocuments()方法但是当我查看源代码时,文件被加载为PDDocuments,我正在努力避免,因为我的应用程序使用导出为pdf的巨大图像并创建包含这些图像的PDDocument原因OutOfMemoryError: Java heap space.

有没有办法合并这些文件而不使用PDFBox或类似工具将整个对象加载到内存？

java apache pdf out-of-memory pdfbox

Pet*_*r K

lucky-day

6
推荐指数

0
解决办法

778
查看次数

生成的pdf中的文本是反向的

我正在使用 pdfbox 在 pdf 文件中添加一行。但我添加的文字是相反的。

File file = new File(filePath);
PDDocument document = PDDocument.load(file);

PDPage page = document.getPage(0);
PDPageContentStream contentStream = new PDPageContentStream(document, page,PDPageContentStream.AppendMode.APPEND,true);

int stampFontSize = grailsApplication.config.pdfStamp.stampFontSize ? grailsApplication.config.pdfStamp.stampFontSize : 20
contentStream.beginText();
contentStream.setFont(PDType1Font.TIMES_ROMAN, stampFontSize);

int leftOffset = grailsApplication.config.pdfStamp.leftOffset ? grailsApplication.config.pdfStamp.leftOffset : 10
int bottomOffset = grailsApplication.config.pdfStamp.bottomOffset ? grailsApplication.config.pdfStamp.bottomOffset : 20
contentStream.moveTextPositionByAmount(grailsApplication.config.xMove,grailsApplication.config.yMove)
contentStream.newLineAtOffset(leftOffset, bottomOffset)

String text = "i have added this line...!!!!";
contentStream.showText(text);
contentStream.endText();

contentStream.close();

document.save(new File(filePath));
document.close();

byte[] pdfData;
pdfData = Files.readAllBytes(file.toPath());
return pdfData;

Run Code Online (Sandbox Code Playgroud)

我尝试使用 moveTextPositionByAmount 方法，但这似乎对文本没有任何影响。为什么我的文字颠倒了，我如何将其设置为正确的方向。

请参阅pdf输出的图像

pdf grails pdfbox

GJA*_*AIN

2017 10-25

6
推荐指数

1
解决办法

1366
查看次数

关于解析 pdf 时没有 Unicode 映射错误

我有一堆 pdf 文件（来自不同来源），我想从中提取文本（不幸的是无法附加文件）。

当前解析结果：

Tika 默默地返回文本，其中缺少许多所需的数据。
直接使用 PDFBox 会给出一堆警告（见下文），并且还会删除它无法识别的数据
Adobe Acrobat Reader（另存为文本操作）保留原始文档结构，但在有问题的字体处放置“”

到目前为止，我从 PDFBox 中看到的所有警告组合在一起：

Aug 06, 2020 3:10:49 PM org.apache.pdfbox.pdmodel.font.PDType0Font toUnicode
WARNING: No Unicode mapping for CID+51 (51) in font AUDQZE+OpenSans-Identity-H

Aug 06, 2020 3:10:49 PM org.apache.pdfbox.pdmodel.font.PDType0Font toUnicode
WARNING: No Unicode mapping for CID+5 (5) in font HCUDUN+DroidSerif-Identity-H

Aug 06, 2020 3:10:49 PM org.apache.pdfbox.pdmodel.font.PDType0Font toUnicode
WARNING: No Unicode mapping for CID+5 (5) in font AUDQZE+OpenSans-Identity-H

Aug 06, 2020 3:10:49 PM org.apache.pdfbox.pdmodel.font.PDType0Font toUnicode
WARNING: No Unicode mapping for CID+55 …

Run Code Online (Sandbox Code Playgroud)

unicode parsing pdf-parsing pdfbox apache-tika

exe*_*nza

2020 08-11

6
推荐指数

0
解决办法

4960
查看次数

我可以减少代码中的 pdfbox 内存使用量还是应该扩展 Java 堆空间

我的 pdf 文件大小为 20 MB，共四页。
当我的程序崩溃时，我有 1 GB 的可用 RAM
我尝试将 setupTempFileOnly 添加到加载方法中，而不进行任何更改。
第一个循环已用内存 217 MB，第二个循环已用内存 220 MB
我得到两个 jpg 文件，但我的程序崩溃了第三个循环，位于第 3 页。
我想将所有页面导出为具有最高 jpg 质量的 jpg（低 jpg 压缩）
我使用最新的Java版本和pdfbox 2.0.20

我的pdf不是很大，只有四页。我可以优化代码以使用更少的内存吗？每次循环后我可以找人清理吗？

我的代码

import java.awt.image.BufferedImage;
import java.io.File;
import java.io.IOException;

import javax.imageio.IIOImage;
import javax.imageio.ImageIO;
import javax.imageio.ImageWriteParam;
import javax.imageio.ImageWriter;
import javax.imageio.plugins.jpeg.JPEGImageWriteParam;
import javax.imageio.stream.FileImageOutputStream;

import org.apache.pdfbox.io.MemoryUsageSetting;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.common.PDRectangle;
import org.apache.pdfbox.rendering.ImageType;
import org.apache.pdfbox.rendering.PDFRenderer;

import net.coobird.thumbnailator.Thumbnails;

public class PdfToImage {
       public static void main (String args[]) throws IOException {
           File file …

Run Code Online (Sandbox Code Playgroud)

java heap-memory pdfbox

Xtr*_*eme

2020 09-04

6
推荐指数

0
解决办法

2554
查看次数

是否可以使用 PDFBox 按位置编辑 PDF 区域？

上下文

目前，我有一个解决方案，可以循环遍历 PDF 并在其中绘制黑色矩形。

所以我已经有一个 PDRectangle 列表，代表我需要在 pdf 上填充/覆盖的正确区域，隐藏我想要的所有文本。

问题

问题 1：黑色矩形下方的文本很容易被其他工具复制、搜索或提取。

我通过展平我的pdf解决了这个问题（将其转换为图像，使其成为单层文档，并且黑色矩形不再被欺骗）。与此处描述的解决方案相同： Disable pdf-text search with pdfBox

这并不是真正的编辑，而更像是一种解决方法。这让我

问题 2：

我的最终 PDF 变成了图像文档，我失去了所有 pdf 属性，包括搜索、复制……而且这是一个慢得多的过程。我想保留所有 pdf 属性，而编辑区域无论如何都无法读取。

我想要完成什么

话虽这么说，我想知道是否有可能以及如何进行实际的编辑，将矩形区域涂黑，因为我已经拥有了我需要的所有位置，使用 PDFBox，保留 pdf 属性并且不允许编辑区域待读。

注意：我知道 PDFBox 使用旧的 ReplaceText 函数存在的问题，但这里我有我需要的位置，以确保我精确地空白我需要的区域。

另外，我接受其他免费图书馆的建议。

技术规格：

PDFBox 2.0.21
Java 11.0.6+10，采用OpenJDK
MacOS Catalina 10.15.4，16gb，x86_64

我的代码

这就是我绘制黑色矩形的方法：

private void draw(PDPage page, PDRectangle hitPdRectangle) throws IOException {

    PDPageContentStream content = new PDPageContentStream(pdDocument, page,
        PDPageContentStream.AppendMode.APPEND, false, false);
    content.setNonStrokingColor(0f);
    
    content.addRect(hitPdRectangle.getLowerLeftX(), 
        hitPdRectangle.getLowerLeftY()  -0.5f, 
        hitPdRectangle.getUpperRightX() - hitPdRectangle.getLowerLeftX(), 
        hitPdRectangle.getUpperRightY() - hitPdRectangle.getLowerLeftY());
    
    content.fill(); …

Run Code Online (Sandbox Code Playgroud)

java pdf pdfbox

Tha*_*ias

2020 11-18

6
推荐指数

1
解决办法

1128
查看次数

Pdfbox 签名 - saveIncremental 与 saveIncrementalForExternalSigning

我正在对 pdf 文件进行签名，但我有些担心。从 pdfbox 示例中，我看到了两种签署 pdf 的方法。第一个是：

document.saveIncremental(output);

Run Code Online (Sandbox Code Playgroud)

第二种方式：

ExternalSigningSupport externalSigning = doc.saveIncrementalForExternalSigning(fos);
// invoke external signature service
byte[] cmsSignature = sign(externalSigning.getContent());

if (isLateExternalSigning()) {
    // this saves the file with a 0 signature
    externalSigning.setSignature(new byte[0]);
    // remember the offset (add 1 because of "<")
    int offset = signature.getByteRange()[1] + 1;
    // now write the signature at the correct offset without any PDFBox methods
    RandomAccessFile raf = new RandomAccessFile(signedFile, "rw");
    raf.seek(offset);
    raf.write(Hex.getBytes(cmsSignature));
    raf.close();
} else {
    // set signature bytes …

Run Code Online (Sandbox Code Playgroud)

java digital-signature pdfbox

SoT*_*SoT

2020 12-22

6
推荐指数

1
解决办法

1558
查看次数

PDDocument.load(file) 不是方法 (PDFBox)

我想制作一个简单的程序，通过Java从pdf文件中获取文本内容。这是代码：

    PDFTextStripper ts = new PDFTextStripper();
    File file = new File("C:\\Meeting IDs.pdf");
    PDDocument doc1 = PDDocument.load(file);
    String allText = ts.getText(doc1);
    String gradeText = allText.substring(allText.indexOf("GRADE 10B"), allText.indexOf("GRADE 10C"));
    System.out.println("Meeting ID for English: "
            + gradeText.substring(gradeText.indexOf("English") + 7, gradeText.indexOf("English") + 20));

Run Code Online (Sandbox Code Playgroud)

这只是代码的一部分，但这是有问题的部分。错误是：The method load(File) is undefined for the type PDDocument

我从 JavaTPoint 学会了使用 PDFBox。我已按照正确的说明安装 PDFBox 库并将其添加到构建路径。我的PDFBox版本是3.0.0 我也搜索了源文件及其方法，但找不到那里的加载方法。

先感谢您。

java eclipse pdf pdfbox

Oja*_*kar

lucky-day

6
推荐指数

1
解决办法

3万
查看次数

ETSI.CAdES.detached 和 adbe.pkcs7.detached PDF 签名之间的差异

我有一个正确的、有效的、LTV 的adbe.pkcs7.detached PDF 签名实现，它是按照ISO32000 2008-1和RFC5652指南制作的。现在我还尝试允许ETSI EN 319 142-1中描述的ETSI.CAdES.detached类型签名。据我到目前为止所了解的，主要区别是/SubFilter值、DSS结构、ESS属性和document-time-stamp。为了符合该标准，所有这 4 项更改都是必要的吗？

如果是，最终的 PDF 文档是否具有与adbe.pkcs7.detached文档相同的长期功能？

ETSI文档中提到，有必要在过期前重新应用文档时间戳和DSS以保持签名有效，为什么adbe.pkcs7.detached文档中不会发生这种情况以及如何避免这？

SignedData结构中的ESS属性究竟是如何构造的？其中是否还有其他变化？

该代码是使用 Java 中的 PDFBox 和 BouncyCastle 实现的，该库是否也能够实现 ETSI 签名？

pdf bouncycastle digital-signature pdfbox

F.C*_*.C.

lucky-day

6
推荐指数

1
解决办法

1606
查看次数

标签统计

pdfbox ×10

java ×7

pdf ×5

digital-signature ×2

apache ×1

apache-tika ×1

bouncycastle ×1

eclipse ×1

grails ×1

heap-memory ×1

out-of-memory ×1

parsing ×1

pdf-generation ×1

pdf-parsing ×1

unicode ×1

标签 统计

标签统计