按坐标提取PDF文本

Question

按坐标提取PDF文本

我想知道Microsoft .NET中是否有一些PDF库可以通过给出坐标来提取文本.

例如(伪代码):

PdfReader reader = new PdfReader();
reader.Load("file.pdf");

// Top, bottom, left, right in pixels or any other unit
string wholeText = reader.GetText(100, 150, 20, 50);

Run Code Online (Sandbox Code Playgroud)

我试图使用PDFBox for .NET(那个在IKVM之上工作)没有运气,这似乎是非常过时和无证的.

也许任何人都有使用PDFBox,iTextSharp或任何其他开源库的良好样本,他/她可以给我一个提示.

先感谢您.

Answer 1

Mat*_*zer 7

好的,谢谢你的努力.

我在IKVM编译的基础上使用Apache的PDFBox得到它,这是最终的代码:

PDDocument doc = PDDocument.load(@"c:\invoice.pdf");

PDFTextStripperByArea stripper = new PDFTextStripperByArea();
stripper.addRegion("testRegion", new java.awt.Rectangle(0, 10, 100, 100));
stripper.extractRegions((PDPage)doc.getDocumentCatalog().getAllPages().get(0));

string text = stripper.getTextForRegion("testRegion");

Run Code Online (Sandbox Code Playgroud)

它就像一个魅力.

无论如何,谢谢你,我希望我自己的答案会帮助别人.如果您需要更多详细信息,请在此处注释,我将更新此答案.

归档时间：	14 年，9 月前
查看次数：	11572 次
最近记录：	7 年，3 月前