使用PDFBox从PDF文档中读取特定页面

Question

使用PDFBox从PDF文档中读取特定页面

如何使用PDFBox从PDF文档中读取特定页面(给定页码)？

Answer 1

这应该工作:

PDPage firstPage = (PDPage)doc.getAllPages().get( 0 );

Run Code Online (Sandbox Code Playgroud)

如本教程的BookMark部分所示

更新2015,版本2.0.0 SNAPSHOT

似乎已被删除并放回(？).getPage在2.0.0 javadoc中.要使用它:

PDDocument document = PDDocument.load(new File(filename));
PDPage doc = document.getPage(0);

Run Code Online (Sandbox Code Playgroud)

该getAllPages方法已更名GETPAGES

PDPage page = (PDPage)doc.getPages().get( 0 );

Run Code Online (Sandbox Code Playgroud)

@missingfaktor`doc`是[PDDocumentCatalog](http://pdfbox.apache.org/apidocs/org/apache/pdfbox/pdmodel/PDDocumentCatalog.html)对象 (4认同)
这里的'doc`是什么类型的？`PDDocument`类似乎没有`getAllPages`方法. (3认同)
对于 PDFBox 1.8.10，PDDocument 类型似乎没有 getAllPages() 方法。不幸的是，该链接不再起作用。 (2认同)

Answer 2

Ray*_*ink 18

//Using PDFBox library available from http://pdfbox.apache.org/  
//Writes pdf document of specific pages as a new pdf file

//Reads in pdf document  
PDDocument pdDoc = PDDocument.load(file);

//Creates a new pdf document  
PDDocument document = null;

//Adds specific page "i" where "i" is the page number and then saves the new pdf document   
try {   
    document = new PDDocument();   
    document.addPage((PDPage) pdDoc.getDocumentCatalog().getAllPages().get(i));   
    document.save("file path"+"new document title"+".pdf");  
    document.close();  
}catch(Exception e){}

Run Code Online (Sandbox Code Playgroud)

归档时间：	14 年，7 月前
查看次数：	50455 次
最近记录：	6 年，9 月前