在将Html转换为Pdf时显示Unicode字符

NIl*_*nke 8 c# unicode itext

我正在使用itextsharp dll将HTML转换为PDF.

HTML有一些Unicode字符,如α,β...当我尝试将HTML转换为PDF时,Unicode字符不会显示在PDF中.

我的功能:

Document doc = new Document(PageSize.LETTER);

using (FileStream fs = new FileStream(Path.Combine("Test.pdf"), FileMode.Create, FileAccess.Write, FileShare.Read))
{
    PdfWriter.GetInstance(doc, fs);

    doc.Open();
    doc.NewPage();

    string arialuniTff = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Fonts),
                                      "ARIALUNI.TTF");

    BaseFont bf = BaseFont.CreateFont(arialuniTff, BaseFont.IDENTITY_H, BaseFont.EMBEDDED);

    Font fontNormal = new Font(bf, 12, Font.NORMAL);

    List<IElement> list = HTMLWorker.ParseToList(new StringReader(stringBuilder.ToString()),
                                                 new StyleSheet());
    Paragraph p = new Paragraph {Font = fontNormal};

    foreach (var element in list)
    {
        p.Add(element);
        doc.Add(p);
    }

    doc.Close();
}
Run Code Online (Sandbox Code Playgroud)

Gre*_*vec 16

您也可以使用新的XMLWorkerHelper(来自库itextsharp.xmlworker),但是您需要覆盖默认的FontFactory实现.

void GeneratePdfFromHtml()
{
  const string outputFilename = @".\Files\report.pdf";
  const string inputFilename = @".\Files\report.html";

  using (var input = new FileStream(inputFilename, FileMode.Open))
  using (var output = new FileStream(outputFilename, FileMode.Create))
  {
    CreatePdf(input, output);
  }
}

void CreatePdf(Stream htmlInput, Stream pdfOutput)
{
  using (var document = new Document(PageSize.A4, 30, 30, 30, 30))
  {
    var writer = PdfWriter.GetInstance(document, pdfOutput);
    var worker = XMLWorkerHelper.GetInstance();

    document.Open();
    worker.ParseXHtml(writer, document, htmlInput, null, Encoding.UTF8, new UnicodeFontFactory());

    document.Close();
  }    
}

public class UnicodeFontFactory : FontFactoryImp
{
    private static readonly string FontPath = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Fonts),
      "arialuni.ttf");

    private readonly BaseFont _baseFont;

    public UnicodeFontFactory()
    {
      _baseFont = BaseFont.CreateFont(FontPath, BaseFont.IDENTITY_H, BaseFont.EMBEDDED);

    }

    public override Font GetFont(string fontname, string encoding, bool embedded, float size, int style, BaseColor color,
      bool cached)
    {
      return new Font(_baseFont, size, style, color);
    }
}
Run Code Online (Sandbox Code Playgroud)


Chr*_*aas 12

处理Unicode字符和iTextSharp时,您需要注意几件事.你已经做过的第一个,那就是得到一个支持你角色的字体.第二件事是你想用iTextSharp实际注册字体,以便它知道它.

//Path to our font
string arialuniTff = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Fonts), "ARIALUNI.TTF");
//Register the font with iTextSharp
iTextSharp.text.FontFactory.Register(arialuniTff);
Run Code Online (Sandbox Code Playgroud)

现在我们有了一个字体,我们需要创建一个StyleSheet对象来告诉iTextSharp何时以及如何使用它.

//Create a new stylesheet
iTextSharp.text.html.simpleparser.StyleSheet ST = new iTextSharp.text.html.simpleparser.StyleSheet();
//Set the default body font to our registered font's internal name
ST.LoadTagStyle(HtmlTags.BODY, HtmlTags.FACE, "Arial Unicode MS");
Run Code Online (Sandbox Code Playgroud)

您还需要做的一个非HTML部分是设置一个特殊encoding参数.此编码特定于iTextSharp,在您的情况下,您希望它Identity-H.如果未设置此值,则默认为Cp1252(WINANSI).

//Set the default encoding to support Unicode characters
ST.LoadTagStyle(HtmlTags.BODY, HtmlTags.ENCODING, BaseFont.IDENTITY_H);
Run Code Online (Sandbox Code Playgroud)

最后,我们需要将样式表传递给ParseToList方法:

//Parse our HTML using the stylesheet created above
List<IElement> list = HTMLWorker.ParseToList(new StringReader(stringBuilder.ToString()), ST);
Run Code Online (Sandbox Code Playgroud)

把这一切放在一起,从开放到结束,你有:

doc.Open();

//Sample HTML
StringBuilder stringBuilder = new StringBuilder();
stringBuilder.Append(@"<p>This is a test: <strong>?,?</strong></p>");

//Path to our font
string arialuniTff = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Fonts), "ARIALUNI.TTF");
//Register the font with iTextSharp
iTextSharp.text.FontFactory.Register(arialuniTff);

//Create a new stylesheet
iTextSharp.text.html.simpleparser.StyleSheet ST = new iTextSharp.text.html.simpleparser.StyleSheet();
//Set the default body font to our registered font's internal name
ST.LoadTagStyle(HtmlTags.BODY, HtmlTags.FACE, "Arial Unicode MS");
//Set the default encoding to support Unicode characters
ST.LoadTagStyle(HtmlTags.BODY, HtmlTags.ENCODING, BaseFont.IDENTITY_H);

//Parse our HTML using the stylesheet created above
List<IElement> list = HTMLWorker.ParseToList(new StringReader(stringBuilder.ToString()), ST);

//Loop through each element, don't bother wrapping in P tags
foreach (var element in list) {
    doc.Add(element);
}

doc.Close();
Run Code Online (Sandbox Code Playgroud)

编辑

在评论中,您将显示指定覆盖字体的HTML.iTextSharp不会使系统占用字体,其HTML解析器不使用字体回退技术.必须手动注册HTML/CSS中指定的任何字体.

string lucidaTff = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Fonts), "l_10646.ttf");
iTextSharp.text.FontFactory.Register(lucidaTff);
Run Code Online (Sandbox Code Playgroud)