如何使用 PDFSharp 从 PDF 中提取 FlateDecoded 图像

der*_*urg 6 c# pdfsharp

如何使用 PDFSharp 从 PDF 文档中提取经过 FlateDecoded(例如 PNG)的图像?

我在 PDFSharp 示例中发现了该评论:

// TODO: You can put the code here that converts vom PDF internal image format to a
// Windows bitmap
// and use GDI+ to save it in PNG format.
// [...]
// Take a look at the file
// PdfSharp.Pdf.Advanced/PdfImage.cs to see how we create the PDF image formats.
Run Code Online (Sandbox Code Playgroud)

有人有解决这个问题的办法吗?

感谢您的回复。

编辑:因为我无法在 8 小时内回答我自己的问题,所以我这样做:

感谢您的快速回复。

我在方法“ExportAsPngImage”中添加了一些代码,但没有得到想要的结果。它只是提取了更多图像(png),它们没有正确的颜色并且扭曲了。

这是我的实际代码:

PdfSharp.Pdf.Filters.FlateDecode flate = new PdfSharp.Pdf.Filters.FlateDecode();
        byte[] decodedBytes = flate.Decode(bytes);

        System.Drawing.Imaging.PixelFormat pixelFormat;

        switch (bitsPerComponent)
        {
            case 1:
                pixelFormat = PixelFormat.Format1bppIndexed;
                break;
            case 8:
                pixelFormat = PixelFormat.Format8bppIndexed;
                break;
            case 24:
                pixelFormat = PixelFormat.Format24bppRgb;
                break;
            default:
                throw new Exception("Unknown pixel format " + bitsPerComponent);
        }

        Bitmap bmp = new Bitmap(width, height, pixelFormat);
        var bmpData = bmp.LockBits(new Rectangle(0, 0, width, height), ImageLockMode.WriteOnly, pixelFormat);
        int length = (int)Math.Ceiling(width * bitsPerComponent / 8.0);
        for (int i = 0; i < height; i++)
        {
            int offset = i * length;
            int scanOffset = i * bmpData.Stride;
            Marshal.Copy(decodedBytes, offset, new IntPtr(bmpData.Scan0.ToInt32() + scanOffset), length);
        }
        bmp.UnlockBits(bmpData);
        using (FileStream fs = new FileStream(@"C:\Export\PdfSharp\" + String.Format("Image{0}.png", count), FileMode.Create, FileAccess.Write))
        {
            bmp.Save(fs, System.Drawing.Imaging.ImageFormat.Png);
        }
Run Code Online (Sandbox Code Playgroud)

这是正确的方法吗?或者我应该选择其他方式?多谢!

New*_*rus 6

我知道这个答案可能会晚几年,但也许会对其他人有所帮助。

\n\n

在我的例子中发生了扭曲,因为image.Elements.GetInteger(PdfImage.Keys.BitsPerComponent)似乎没有返回正确的值。正如Vive la d\xc3\xa9raison在你的问题下指出的那样,你得到了使用Marshal.Copy. 因此执行后反转字节并旋转位图Marshal.Copy就可以完成这项工作。

\n\n

结果代码如下所示:

\n\n
private static void ExportAsPngImage(PdfDictionary image, ref int count)\n    {\n        int width = image.Elements.GetInteger(PdfImage.Keys.Width);\n        int height = image.Elements.GetInteger(PdfImage.Keys.Height);\n\n        var canUnfilter = image.Stream.TryUnfilter();\n        byte[] decodedBytes;\n\n        if (canUnfilter)\n        {\n            decodedBytes = image.Stream.Value;\n        }\n        else\n        {\n            PdfSharp.Pdf.Filters.FlateDecode flate = new PdfSharp.Pdf.Filters.FlateDecode();\n            decodedBytes = flate.Decode(image.Stream.Value);\n        }\n\n        int bitsPerComponent = 0;\n        while (decodedBytes.Length - ((width * height) * bitsPerComponent / 8) != 0)\n        {\n            bitsPerComponent++;\n        }\n\n        System.Drawing.Imaging.PixelFormat pixelFormat;\n        switch (bitsPerComponent)\n        {\n            case 1:\n                pixelFormat = System.Drawing.Imaging.PixelFormat.Format1bppIndexed;\n                break;\n            case 8:\n                pixelFormat = System.Drawing.Imaging.PixelFormat.Format8bppIndexed;\n                break;\n            case 16:\n                pixelFormat = System.Drawing.Imaging.PixelFormat.Format16bppArgb1555;\n                break;\n            case 24:\n                pixelFormat = System.Drawing.Imaging.PixelFormat.Format24bppRgb;\n                break;\n            case 32:\n                pixelFormat = System.Drawing.Imaging.PixelFormat.Format32bppArgb;\n                break;\n            case 64:\n                pixelFormat = System.Drawing.Imaging.PixelFormat.Format64bppArgb;\n                break;\n            default:\n                throw new Exception("Unknown pixel format " + bitsPerComponent);\n        }\n\n        decodedBytes = decodedBytes.Reverse().ToArray();\n\n        Bitmap bmp = new Bitmap(width, height, pixelFormat);\n        BitmapData bmpData = bmp.LockBits(new Rectangle(0, 0, bmp.Width, bmp.Height), ImageLockMode.WriteOnly, bmp.PixelFormat);\n        int length = (int)Math.Ceiling(width * (bitsPerComponent / 8.0));\n        for (int i = 0; i < height; i++)\n        {\n            int offset = i * length;\n            int scanOffset = i * bmpData.Stride;\n            Marshal.Copy(decodedBytes, offset, new IntPtr(bmpData.Scan0.ToInt32() + scanOffset), length);\n        }\n        bmp.UnlockBits(bmpData);\n        bmp.RotateFlip(RotateFlipType.Rotate180FlipNone);\n        bmp.Save(String.Format("exported_Images\\\\Image{0}.png", count++), System.Drawing.Imaging.ImageFormat.Png);\n    }\n
Run Code Online (Sandbox Code Playgroud)\n\n

该代码可能需要一些优化,但在我的例子中它确实正确导出了 FlateDecoded 图像。

\n


Je *_*not 1

要获得 Windows BMP,您只需创建位图标头,然后将图像数据复制到位图中。PDF 图像是字节对齐的(每个新行从字节边界开始),而 Windows BMP 是 DWORD 对齐(每个新行从 DWORD 边界开始(由于历史原因,DWORD 为 4 个字节))。位图标头所需的所有信息都可以在过滤器参数中找到或可以计算。

调色板是 PDF 中的另一个 FlateEncoded 对象。您还可以将其复制到 BMP 中。

必须针对多种格式(每像素 1 位、8 bpp、24 bpp、32 bpp)执行此操作。