Rob*_*cks 11 c# pdf command-line bookmarks
我正在使用PDF转换器访问PDF中的图形数据.一切正常,但我没有得到书签列表.是否有可以读取PDF书签的命令行应用程序或C#组件?我找到了iText和SharpPDF库,我正在浏览它们.你做过这样的事吗?
小智 12
请尝试以下代码
PdfReader pdfReader = new PdfReader(filename);
IList<Dictionary<string, object>> bookmarks = SimpleBookmark.GetBookmark(pdfReader);
for(int i=0;i<bookmarks.Count;i++)
{
MessageBox.Show(bookmarks[i].Values.ToArray().GetValue(0).ToString());
if (bookmarks[i].Count > 3)
{
MessageBox.Show(bookmarks[i].ToList().Count.ToString());
}
}
Run Code Online (Sandbox Code Playgroud)
注意:不要忘记将iTextSharp DLL添加到项目中.
由于书签位于树结构中(https://en.wikipedia.org/wiki/Tree_(data_struct)),因此我在这里使用了一些递归来收集所有书签及其子书签。
iTextSharp 为我解决了这个问题。
dotnet add package iTextSharp
Run Code Online (Sandbox Code Playgroud)
使用以下代码收集所有书签:
using System.Collections.Generic;
using System.Text;
using System.Text.RegularExpressions;
using iTextSharp.text.pdf;
namespace PdfManipulation
{
class Program
{
static void Main(string[] args)
{
StringBuilder bookmarks = ExtractAllBookmarks("myPdfFile.pdf");
}
private static StringBuilder ExtractAllBookmarks(string pdf)
{
StringBuilder sb = new StringBuilder();
PdfReader reader = new PdfReader(pdf);
IList<Dictionary<string, object>> bookmarksTree = SimpleBookmark.GetBookmark(reader);
foreach (var node in bookmarksTree)
{
sb.AppendLine(PercorreBookmarks(node).ToString());
}
return RemoveAllBlankLines(sb);
}
private static StringBuilder RemoveAllBlankLines(StringBuilder sb)
{
return new StringBuilder().Append(Regex.Replace(sb.ToString(), @"^\s+$[\r\n]*", string.Empty, RegexOptions.Multiline));
}
private static StringBuilder PercorreBookmarks(Dictionary<string, object> bookmark)
{
StringBuilder sb = new StringBuilder();
sb.AppendLine(bookmark["Title"].ToString());
if (bookmark != null && bookmark.ContainsKey("Kids"))
{
IList<Dictionary<string, object>> children = (IList<Dictionary<string, object>>) bookmark["Kids"];
foreach (var bm in children)
{
sb.AppendLine(PercorreBookmarks(bm).ToString());
}
}
return sb;
}
}
}
Run Code Online (Sandbox Code Playgroud)