使用iText5 for .NET读取PDF文件

Question

使用iText5 for .NET读取PDF文件

我正在使用C#作为编程平台并iTextSharp阅读PDF内容.我使用下面的代码来阅读内容,但它似乎每页读取.

        public string ReadPdfFile(object Filename)
        {

            string strText = string.Empty;
            try
            {
                PdfReader reader = new PdfReader((string)Filename);

                for (int page = 1; page <= reader.NumberOfPages; page++)
                {
                    ITextExtractionStrategy its = new iTextSharp.text.pdf.parser.SimpleTextExtractionStrategy();
                    String s = PdfTextExtractor.GetTextFromPage(reader, page, its);

                    s = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(s)));
                    strText = strText + s;

                }
                reader.Close();
            }
            catch (Exception ex)
            {
                MessageBox.Show(ex.Message);
            }
            return strText;
        }

Run Code Online (Sandbox Code Playgroud)

任何人都可以帮助我如何编写每行读取pdf内容的代码？

Answer 1

Jon*_*han 14

试试这个,使用LocationTextExtractionStrategy而不是SimpleTextExtractionStrategy 它将为返回的文本添加新的行字符.然后,您可以使用strText.Split('\n')将文本拆分为a string[]并按行进行消费.

归档时间：	14 年，2 月前
查看次数：	26079 次
最近记录：	8 年，4 月前