如何使用 PDFSharp 从 PDF 文档中提取经过 FlateDecoded(例如 PNG)的图像?
我在 PDFSharp 示例中发现了该评论:
// TODO: You can put the code here that converts vom PDF internal image format to a
// Windows bitmap
// and use GDI+ to save it in PNG format.
// [...]
// Take a look at the file
// PdfSharp.Pdf.Advanced/PdfImage.cs to see how we create the PDF image formats.
Run Code Online (Sandbox Code Playgroud)
有人有解决这个问题的办法吗?
感谢您的回复。
编辑:因为我无法在 8 小时内回答我自己的问题,所以我这样做:
感谢您的快速回复。
我在方法“ExportAsPngImage”中添加了一些代码,但没有得到想要的结果。它只是提取了更多图像(png),它们没有正确的颜色并且扭曲了。
这是我的实际代码:
PdfSharp.Pdf.Filters.FlateDecode flate = new PdfSharp.Pdf.Filters.FlateDecode();
byte[] decodedBytes = flate.Decode(bytes);
System.Drawing.Imaging.PixelFormat pixelFormat;
switch (bitsPerComponent)
{
case 1:
pixelFormat = PixelFormat.Format1bppIndexed;
break;
case 8:
pixelFormat = PixelFormat.Format8bppIndexed;
break;
case 24:
pixelFormat = PixelFormat.Format24bppRgb;
break;
default:
throw new Exception("Unknown pixel format " + bitsPerComponent);
}
Bitmap bmp = new Bitmap(width, height, pixelFormat);
var bmpData = bmp.LockBits(new Rectangle(0, 0, width, height), ImageLockMode.WriteOnly, pixelFormat);
int length = (int)Math.Ceiling(width * bitsPerComponent / 8.0);
for (int i = 0; i < height; i++)
{
int offset = i * length;
int scanOffset = i * bmpData.Stride;
Marshal.Copy(decodedBytes, offset, new IntPtr(bmpData.Scan0.ToInt32() + scanOffset), length);
}
bmp.UnlockBits(bmpData);
using (FileStream fs = new FileStream(@"C:\Export\PdfSharp\" + String.Format("Image{0}.png", count), FileMode.Create, FileAccess.Write))
{
bmp.Save(fs, System.Drawing.Imaging.ImageFormat.Png);
}
Run Code Online (Sandbox Code Playgroud)
这是正确的方法吗?或者我应该选择其他方式?多谢!
我知道这个答案可能会晚几年,但也许会对其他人有所帮助。
\n\n在我的例子中发生了扭曲,因为image.Elements.GetInteger(PdfImage.Keys.BitsPerComponent)似乎没有返回正确的值。正如Vive la d\xc3\xa9raison在你的问题下指出的那样,你得到了使用Marshal.Copy. 因此执行后反转字节并旋转位图Marshal.Copy就可以完成这项工作。
结果代码如下所示:
\n\nprivate static void ExportAsPngImage(PdfDictionary image, ref int count)\n {\n int width = image.Elements.GetInteger(PdfImage.Keys.Width);\n int height = image.Elements.GetInteger(PdfImage.Keys.Height);\n\n var canUnfilter = image.Stream.TryUnfilter();\n byte[] decodedBytes;\n\n if (canUnfilter)\n {\n decodedBytes = image.Stream.Value;\n }\n else\n {\n PdfSharp.Pdf.Filters.FlateDecode flate = new PdfSharp.Pdf.Filters.FlateDecode();\n decodedBytes = flate.Decode(image.Stream.Value);\n }\n\n int bitsPerComponent = 0;\n while (decodedBytes.Length - ((width * height) * bitsPerComponent / 8) != 0)\n {\n bitsPerComponent++;\n }\n\n System.Drawing.Imaging.PixelFormat pixelFormat;\n switch (bitsPerComponent)\n {\n case 1:\n pixelFormat = System.Drawing.Imaging.PixelFormat.Format1bppIndexed;\n break;\n case 8:\n pixelFormat = System.Drawing.Imaging.PixelFormat.Format8bppIndexed;\n break;\n case 16:\n pixelFormat = System.Drawing.Imaging.PixelFormat.Format16bppArgb1555;\n break;\n case 24:\n pixelFormat = System.Drawing.Imaging.PixelFormat.Format24bppRgb;\n break;\n case 32:\n pixelFormat = System.Drawing.Imaging.PixelFormat.Format32bppArgb;\n break;\n case 64:\n pixelFormat = System.Drawing.Imaging.PixelFormat.Format64bppArgb;\n break;\n default:\n throw new Exception("Unknown pixel format " + bitsPerComponent);\n }\n\n decodedBytes = decodedBytes.Reverse().ToArray();\n\n Bitmap bmp = new Bitmap(width, height, pixelFormat);\n BitmapData bmpData = bmp.LockBits(new Rectangle(0, 0, bmp.Width, bmp.Height), ImageLockMode.WriteOnly, bmp.PixelFormat);\n int length = (int)Math.Ceiling(width * (bitsPerComponent / 8.0));\n for (int i = 0; i < height; i++)\n {\n int offset = i * length;\n int scanOffset = i * bmpData.Stride;\n Marshal.Copy(decodedBytes, offset, new IntPtr(bmpData.Scan0.ToInt32() + scanOffset), length);\n }\n bmp.UnlockBits(bmpData);\n bmp.RotateFlip(RotateFlipType.Rotate180FlipNone);\n bmp.Save(String.Format("exported_Images\\\\Image{0}.png", count++), System.Drawing.Imaging.ImageFormat.Png);\n }\nRun Code Online (Sandbox Code Playgroud)\n\n该代码可能需要一些优化,但在我的例子中它确实正确导出了 FlateDecoded 图像。
\n要获得 Windows BMP,您只需创建位图标头,然后将图像数据复制到位图中。PDF 图像是字节对齐的(每个新行从字节边界开始),而 Windows BMP 是 DWORD 对齐(每个新行从 DWORD 边界开始(由于历史原因,DWORD 为 4 个字节))。位图标头所需的所有信息都可以在过滤器参数中找到或可以计算。
调色板是 PDF 中的另一个 FlateEncoded 对象。您还可以将其复制到 BMP 中。
必须针对多种格式(每像素 1 位、8 bpp、24 bpp、32 bpp)执行此操作。