标签: pdfbox

如何在Java中将两个PDF文件合并为一个？

我想使用PDFBox将许多PDF文件合并为一个,这就是我所做的:

PDDocument document = new PDDocument();
for (String pdfFile: pdfFiles) {
    PDDocument part = PDDocument.load(pdfFile);
    List<PDPage> list = part.getDocumentCatalog().getAllPages();
    for (PDPage page: list) {
        document.addPage(page);
    }
    part.close();
}
document.save("merged.pdf");
document.close();

Run Code Online (Sandbox Code Playgroud)

哪里pdfFiles是一个ArrayList<String>包含了所有的PDF文件.

当我运行上述内容时,我总是得到:

org.apache.pdfbox.exceptions.COSVisitorException: Bad file descriptor

Run Code Online (Sandbox Code Playgroud)

难道我做错了什么？这样做还有其他办法吗？

java pdf pdfbox

Lip*_*pis

2012 10-04

65
推荐指数

5
解决办法

11万
查看次数

使用PDFBox解析PDF文件(尤其是表格)

我需要解析包含表格数据的PDF文件.我正在使用PDFBox提取文件文本以便稍后解析结果(String).问题是文本提取不像我预期的表格数据那样工作.例如,我有一个包含这样的表的文件(7列:前两个总是有数据,只有一个Complexity列有数据,只有一个Financing列有数据):

+----------------------------------------------------------------+
| AIH | Value | Complexity                     | Financing       |
|     |       | Medium | High | Not applicable | MAC/Other | FAE |
+----------------------------------------------------------------+
| xyz | 12.43 | 12.34  |      |                | 12.34     |     |
+----------------------------------------------------------------+
| abc | 1.56  |        | 1.56 |                |           | 1.56|
+----------------------------------------------------------------+

Run Code Online (Sandbox Code Playgroud)

然后我使用PDFBox:

PDDocument document = PDDocument.load(pathToFile);
PDFTextStripper s = new PDFTextStripper();
String content = s.getText(document);

Run Code Online (Sandbox Code Playgroud)

这两行数据将被提取如下:

xyz 12.43 12.4312.43
abc 1.56 1.561.56

Run Code Online (Sandbox Code Playgroud)

最后两个数字之间没有空格,但这不是最大的问题.问题是我不知道最后两个数字是什么意思:中,高,不适用？MAC /其他,FAE？我没有数字和列之间的关系.

我不需要使用PDFBox库,因此使用另一个库的解决方案很好.我想要的是能够解析文件并知道每个解析的数字意味着什么.

java pdf parsing tabular pdfbox

Mat*_*ira

2017 04-27

63
推荐指数

7
解决办法

9万
查看次数

将pdf转换为svg

我想将PDF转换为SVG,请建议一些能够有效执行此操作的库/可执行文件.我使用apache PDFBox和Batik库编写了自己的java程序 -

PDDocument document = PDDocument.load( pdfFile );
DOMImplementation domImpl =
    GenericDOMImplementation.getDOMImplementation();

// Create an instance of org.w3c.dom.Document.
String svgNS = "http://www.w3.org/2000/svg";
Document svgDocument = domImpl.createDocument(svgNS, "svg", null);
SVGGeneratorContext ctx = SVGGeneratorContext.createDefault(svgDocument);
ctx.setEmbeddedFontsOn(true);

// Ask the test to render into the SVG Graphics2D implementation.

    for(int i = 0 ; i < document.getNumberOfPages() ; i++){
        String svgFName = svgDir+"page"+i+".svg";
        (new File(svgFName)).createNewFile();
        // Create an instance of the SVG Generator.
        SVGGraphics2D svgGenerator = new SVGGraphics2D(ctx,false);
        Printable page  = document.getPrintable(i);
        page.print(svgGenerator, document.getPageFormat(i), …

Run Code Online (Sandbox Code Playgroud)

pdf svg batik pdfbox

use*_*541

lucky-day

50
推荐指数

3
解决办法

5万
查看次数

Apache PDFBox将pdf转换为图像

有人可以给我一个例子,说明如何使用Apache PDFBox在不同的图像中转换pdf(pdf的每一页一个).提前致谢

pdfbox

use*_*568

2016 03-21

49
推荐指数

2
解决办法

5万
查看次数

PDF查明文本是否带下划线或表格单元格

我一直在玩PdfBox和PDFTextStripperByArea方法.

如果文本是粗体或斜体,我能够提取信息,但我无法获得下划线信息.

据我所知,在PDF中,下划线是通过绘制线条完成的.所以从理论上讲,我应该能够获得有关文本周围某些行的某些信息.根据这些信息,我可以找出是否有下划线或表格.

到目前为止,这是我的代码:

List<TextPosition> textPos = charactersByArticle.get(index);

for (TextPosition t : textPos)
{               
    if (t.getFont().getFontDescriptor() != null)
    {                           
        if (t.getFont().getFontDescriptor().getFontWeight() > BOLD_WEIGHT ||
            t.getFont().getFontDescriptor().isForceBold())
        {
            isBold = true;
        }

        if (t.getFont().getFontDescriptor().isItalic())
        {
            isItalic = true;
        }
    }
}

Run Code Online (Sandbox Code Playgroud)

我试图玩PDGraphicsState对象,该对象在PDFStreamEngine类的processEncodedText方法中处理,但没有在那里找到行的信息.

有关可以从中检索此信息的任何建议吗？

java pdf pdfbox

Dre*_*ejc

2012 12-19

42
推荐指数

2
解决办法

5094
查看次数

如何使用PDFBox居中文本

我的问题非常简单:如何使用PDF将文本置于PDF中心PDFBox？

我事先不知道字符串,我找不到中间的试用版.字符串并不总是具有相同的宽度.

我需要:

一种可以使文本居中的方法,例如 addCenteredString(myString)
一种方法,可以给我字符串的宽度,以像素为单位.然后我可以计算中心,因为我知道PDF的尺寸.

欢迎任何帮助!

java text-alignment pdfbox

Ste*_*roz

2015 12-15

36
推荐指数

2
解决办法

2万
查看次数

如何使用Apache pdfbox在PDF中生成多行

我正在使用Pdfbox使用Java生成PDF文件.问题是当我在文档中添加长文本内容时,它无法正确显示.只显示其中的一部分.这也是一条线.

我希望文本有多行.

我的代码如下:

PDPageContentStream pdfContent=new PDPageContentStream(pdfDocument, pdfPage, true, true);

pdfContent.beginText();
pdfContent.setFont(pdfFont, 11);
pdfContent.moveTextPositionByAmount(30,750);            
pdfContent.drawString("I am trying to create a PDF file with a lot of text contents in the document. I am using PDFBox");
pdfContent.endText();

Run Code Online (Sandbox Code Playgroud)

我的输出:

这是我的输出文件

java pdf-generation pdfbox

Ron*_*pel

2015 08-24

34
推荐指数

3
解决办法

4万
查看次数

无法使用PDFBox将图像添加到pdf

我正在编写一个使用pdfbox库从头开始创建pdf的Java应用程序.
我需要在页面中放置一个jpg图像.

我正在使用此代码:

PDDocument document = new PDDocument();
PDPage page = new PDPage(PDPage.PAGE_SIZE_A4);
document.addPage(page); 
PDPageContentStream contentStream = new PDPageContentStream(document, page);

/* ... */ 
/* code to add some text to the page */
/* ... */

InputStream in = new FileInputStream(new File("c:/myimg.jpg"));
PDJpeg img = new PDJpeg(document, in);
contentStream.drawImage(img, 100, 700);
contentStream.close();
document.save("c:/mydoc.pdf");

Run Code Online (Sandbox Code Playgroud)

当我运行代码时,它会成功终止,但是如果我使用Acrobat Reader打开生成的pdf文件,页面将完全为白色,并且图像不会放入其中.
而是将文本正确放置在页面中.

有关如何将我的图像放入pdf的任何提示？

java pdf pdfbox

Dav*_*ano

2011 12-22

28
推荐指数

2
解决办法

3万
查看次数

使用pdfbox从pdf中提取图像

我试图使用pdfbox从pdf中提取图像.示例pdf 在这里

但我只获得空白图像.

代码即时尝试: -

public static void main(String[] args) {
   PDFImageExtract obj = new PDFImageExtract();
    try {
        obj.read_pdf();
    } catch (IOException ex) {
        System.out.println("" + ex);
    }

}

 void read_pdf() throws IOException {
    PDDocument document = null; 
    try {
        document = PDDocument.load("C:\\Users\\Pradyut\\Documents\\MCS-034.pdf");
    } catch (IOException ex) {
        System.out.println("" + ex);
    }
    List pages = document.getDocumentCatalog().getAllPages();
    Iterator iter = pages.iterator(); 
    int i =1;
    String name = null;

    while (iter.hasNext()) {
        PDPage page = (PDPage) iter.next();
        PDResources resources = page.getResources();
        Map …

Run Code Online (Sandbox Code Playgroud)

java pdf image pdfbox

Pra*_*rya

2012 01-03

28
推荐指数

4
解决办法

4万
查看次数