Xan*_*der 4 c# pdf extract itext carriage-return
我需要运行一些分析,从PDF文档中提取数据.
使用iTextSharp,我使用该PdfTextExtractor.GetTextFromPage方法从PDF文档中提取内容,并在一个长行中返回给我.
有没有办法逐行获取文本,以便我可以将它们存储在数组中?这样我就可以逐行分析数据,这将更加灵活.
以下是我使用的代码:
string urlFileName1 = "pdf_link";
PdfReader reader = new PdfReader(urlFileName1);
string text = string.Empty;
for (int page = 1; page <= reader.NumberOfPages; page++)
{
text += PdfTextExtractor.GetTextFromPage(reader, page);
}
reader.Close();
candidate3.Text = text.ToString();
Run Code Online (Sandbox Code Playgroud)
public void ExtractTextFromPdf(string path)
{
using (PdfReader reader = new PdfReader(path))
{
StringBuilder text = new StringBuilder();
ITextExtractionStrategy Strategy = new iTextSharp.text.pdf.parser.LocationTextExtractionStrategy();
for (int i = 1; i <= reader.NumberOfPages; i++)
{
string page = "";
page = PdfTextExtractor.GetTextFromPage(reader, i,Strategy);
string[] lines = page.Split('\n');
foreach (string line in lines)
{
MessageBox.Show(line);
}
}
}
}
Run Code Online (Sandbox Code Playgroud)
小智 -2
尝试
String page = PdfTextExtractor.getTextFromPage(reader, 2);
String s1[]=page.split("\n");
Run Code Online (Sandbox Code Playgroud)