Ama*_*bra 2 .net c# itextsharp
我正在使用iTextSharp从特定矩形内的pdf中获取数据
在高度的情况下获取的数据工作正常但在宽度的情况下,它返回整行而不是矩形中的单词.
我使用的代码如下:
PdfReader reader = new PdfReader(Home.currentInstance.Get_PDF_URL());
iTextSharp.text.Rectangle pageRectangle = reader.GetPageSize(currentPage);
float selection_x = ((float)(selectionRectangle.RenderTransform.Value.OffsetX) / (float)canvas.Width) * pageRectangle.Width;
float selection_y = pageRectangle.Height - (((float)(selectionRectangle.RenderTransform.Value.OffsetY) / (float)canvas.Height) * pageRectangle.Height);
float selection_height = ((float)(selectionRectangle.Height) / (float)canvas.Height) * pageRectangle.Height;
float selection_width = ((float)(selectionRectangle.Width) / (float)canvas.Width) * pageRectangle.Width;
selection_y -= selection_height;
RectangleJ rect = new RectangleJ(selection_x,selection_y,selection_width,selection_height);
RenderFilter[] filter = { new RegionTextRenderFilter(rect) };
ITextExtractionStrategy strategy;
strategy = new FilteredTextRenderListener(
new LocationTextExtractionStrategy(), filter
);
String pageText = PdfTextExtractor.GetTextFromPage(reader, currentPage, strategy);
Run Code Online (Sandbox Code Playgroud)
任何帮助将受到高度赞赏.
提前致谢
最后,我能够解决这个问题
我创建了以下类
public class LimitedTextStrategy : iTextSharp.text.pdf.parser.ITextExtractionStrategy
{
public readonly ITextExtractionStrategy textextractionstrategy;
public LimitedTextStrategy(ITextExtractionStrategy strategy)
{
this.textextractionstrategy = strategy;
}
public void RenderText(iTextSharp.text.pdf.parser.TextRenderInfo renderInfo)
{
foreach (TextRenderInfo info in renderInfo.GetCharacterRenderInfos())
{
this.textextractionstrategy.RenderText(info);
}
}
public string GetResultantText()
{
return this.textextractionstrategy.GetResultantText();
}
public void BeginTextBlock() {
this.textextractionstrategy.BeginTextBlock();
}
public void EndTextBlock() {
this.textextractionstrategy.EndTextBlock();
}
public void RenderImage(ImageRenderInfo renderInfo) {
this.textextractionstrategy.RenderImage(renderInfo);
}
}
Run Code Online (Sandbox Code Playgroud)
然后将提取线更改为
String pageText = PdfTextExtractor.GetTextFromPage(reader, currentPage, new LimitedTextStrategy(strategy));
Run Code Online (Sandbox Code Playgroud)
现在它工作正常.我希望它也可以帮助别人
| 归档时间: |
|
| 查看次数: |
2062 次 |
| 最近记录: |