标签: itextsharp

String HTML = Session["xpdf"].ToString();
string filename = "\\xpdf\\xpdf____" + Request.QueryString["id"] + ".pdf";
string filepath = HttpContext.Current.Server.MapPath("\\xpdf\\xpdf____" + Request.QueryString["id"] + ".pdf");
Document document = new Document(PageSize.A4);
PdfWriter.GetInstance(document, new FileStream(filepath, FileMode.Create));
document.Open();
HTMLWorker hw = new HTMLWorker(document);
hw.Parse(new StringReader(HTML));
document.Close();
ShowPdf(filename, filepath);
PdfAction action = new PdfAction(PdfAction.PRINTDIALOG);

Run Code Online (Sandbox Code Playgroud)

并考虑我的HTML代码看起来像这样:

<div>
   <table style="border:solid 1px #ccc; color:#000;">
      <tr>
          <td style="width:100px;color:#cc0000;"></td>
          <td style="width:10px">:</td>
          <td style="width:200px"></td>
      </tr>
   </table>
</div>

Run Code Online (Sandbox Code Playgroud)

c# asp.net itextsharp

Kad*_*dir

lucky-day

4
推荐指数

1
解决办法

1万
查看次数

从iTextSharp中的PDF获取文档属性

我正试图从PDF文件中获取一些信息.我尝试过使用PdfSharp,它具有我需要的信息属性,但它无法打开iref流,所以我不得不放弃它.

相反,我正在尝试iTextSharp.到目前为止,我已经设法从Info数组中获取一些基本信息,如标题,aurhor和subject.

但是,我现在正在获取更多信息,但无法在iTextSharp中找到它暴露的位置(如果它暴露)....我所追踪的信息在下图中突出显示:

我需要的信息

我无法弄清楚这些信息的存储位置.任何和所有的帮助将不胜感激.

c# pdf itextsharp

Tom*_*ech

lucky-day

4
推荐指数

1
解决办法

4507
查看次数

iTextSharp从特定位置读取

从PDF文件中读取数据时,我遇到使用iTextSharp的问题.我想要实现的是只读取PDF页面的特定部分(我想只检索位于恒定位置的地址信息).我在阅读以下所有页面时看到了iTextSharp的用法:

        StringBuilder text = new StringBuilder();

        if (File.Exists(fileName))
        {
            PdfReader pdfReader = new PdfReader(fileName);

            for (int page = 1; page <= pdfReader.NumberOfPages; page++)
            {
                ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();
                string currentText = PdfTextExtractor.GetTextFromPage(pdfReader, page, strategy);

                currentText = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(currentText)));
                text.Append(currentText);
            }
            pdfReader.Close();
        }
        return text.ToString();

Run Code Online (Sandbox Code Playgroud)

但是我怎样才能将它限制在特定的位置？我愿意使用任何东西,甚至是OCR技术,因为将来某些文件可能会成为图像(但此时不是必需的).这个项目仅适合我,所以没有商业用途.

谢谢!

c# ocr itextsharp

Rob*_* J.

lucky-day

4
推荐指数

1
解决办法

8278
查看次数

使用ITextSharp在两个分隔线之间从PDF中提取文本

我有一个1500多页的pdf,带有一些"随机"文本,我必须从中提取一些文本...我可以识别出那样的块:

bla bla bla bla bla 
...
...
...
-------------------------- (separator blue image)
XXX: TEXT TEXT TEXT
TEXT TEXT TEXT TEXT
...
-------------------------- (separator blue image)
bla bla bla bla
...
...
-------------------------- (separator blue image)
XXX: TEXT2 TEXT2 TEXT2
TEXT2 TEXT2 TEXT TEXT2
...
-------------------------- (separator blue image)

Run Code Online (Sandbox Code Playgroud)

我需要提取所有文本beetween分隔符(所有块)'XXX'出现在所有块的开头,但我没有办法检测块的结尾.是否可以在解析器中使用图像分隔符？怎么样？

还有其他可能的方法吗

编辑更多信息没有背景和文本是复制和可管理的

样本pdf:1

查看示例第320页

谢谢

c# pdf itextsharp

Pau*_*aul

2015 08-01

4
推荐指数

1
解决办法

1523
查看次数

表嵌套在itextsharp中的PDFPCELL中

我想给一个桌子提供圆形边框,但经过研究后我发现它无法完成,但我们可以给一个单元格提供圆形边框.

所以我做了这样的事情

PdfPCell cell = new PdfPCell()
{
     CellEvent = rr, // rr is RoundRectangle object
     Border = PdfPCell.NO_BORDER,
     Padding = 4,
     Phrase = new Phrase("test")
};
table.AddCell(cell);
document.Add(table);

Run Code Online (Sandbox Code Playgroud)

现在我可以为一个单元格提供边框,所以我想要做的是我想将完整的嵌套表放入这个pdfpcell中,以便我可以间接地在该表上实现边界...

你可以帮忙吗？如果你不理解我的方法..问题......我将在评论部分更清楚地解释......

c# itextsharp

ank*_*rma

lucky-day

4
推荐指数

1
解决办法

4431
查看次数