每行阅读PDF

Bry*_*yan 5 c# pdf itext

如何line by line使用iText5 for .NET 读取PDF文件?我通过互联网搜索,但我只发现每页内容阅读PDF文件.

请看下面的代码.

public string ReadPdfFile(object Filename)
{

    string strText = string.Empty;
    try
    {
        PdfReader reader = new PdfReader((string)Filename);

        for (int page = 1; page <= reader.NumberOfPages; page++)
        {
            ITextExtractionStrategy its = new iTextSharp.text.pdf.parser.SimpleTextExtractionStrategy();

            String s = PdfTextExtractor.GetTextFromPage(reader, page, its);

            s = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(s)));
            strText = strText + s;

        }
        reader.Close();
    }
    catch (Exception ex)
    {
        MessageBox.Show(ex.Message);
    }
    return strText;
}
Run Code Online (Sandbox Code Playgroud)

Jon*_*han 5

试试这个,使用LocationTextExtractionStrategy而不是SimpleTextExtractionStrategy它将为返回的文本添加新的行字符.然后,您可以使用strText.Split('\n')将文本拆分为a string[]并按行进行使用.