我正在使用iText从pdf文件中的特定位置提取一些文本。为了做到这一点,我正在使用LocationTextExtractionStrategy:
public static void main(String[] args) throws Exception {
PdfReader pdfReader = new PdfReader("location_text_extraction_test.pdf");
Rectangle rectangle = new Rectangle(38, 0, 516, 516);
RenderFilter[] filter = {new RegionTextRenderFilter(rectangle)};
TextExtractionStrategy strategy = new FilteredTextRenderListener(new LocationTextExtractionStrategy(), filter);
String text = PdfTextExtractor.getTextFromPage(pdfReader, 1, strategy);
System.out.println(text);
pdfReader.close();
}
Run Code Online (Sandbox Code Playgroud)
问题在于提取的文本顺序错误:
应该提取为:
Run Code Online (Sandbox Code Playgroud)Part Description Quantity Unit Price Total For Line Extended Price Landing Fee 1.00 407.84 $ USD 407.84 407.84 $
提取为:
Run Code Online (Sandbox Code Playgroud)Total For Line Extended Price Part Description Quantity Unit Price 1.00 407.84 $ USD …
我有一些代码需要3个不同的PDF字节数组并合并它们.这段代码效果很好.问题(有些人)认为每个PDF都被认为是一个完整的页面(如果打印),即使它上面只有4英寸的内容,因此垂直留下7英寸的空白区域.然后将中间文档放入其中,并且可以在其末尾处具有或不具有垂直空白空间.然后页脚也会放在自己的页面上.
这是代码:
byte[] Bytes = rv.LocalReport.Render("PDF", null, out MimeType, out Encoding, out Extension, out StreamIDs, out Warnings);
List<byte[]> MergeSets = // This is filled prior to this code
// Append any other pages to this primary letter
if (MergeSets.Count > 0) {
MemoryStream ms = new MemoryStream();
Document document = new Document();
PdfCopy copy = new PdfCopy(document, ms);
document.Open();
PdfImportedPage page;
PdfReader reader = new PdfReader(Bytes); // read the generated primary Letter
int pages = reader.NumberOfPages;
for (int i = …Run Code Online (Sandbox Code Playgroud)