我目前正在尝试重新压缩已经创建的pdf,我试图找到一种方法来重新压缩文档中的图像,以减小文件大小.
我一直在尝试使用DataLogics PDE和iTextSharp库来执行此操作,但我找不到对项目进行流重新压缩的方法.
我有关于循环xobjects并获取图像,然后将DPI降低到96或使用libjpeg C#implimentation来改变图像的质量但是将其恢复到pdf流似乎总是最终,内存损坏或其他一些问题.
任何样品将不胜感激.
谢谢
iText和iTextSharp有一些替换间接对象的方法.具体而言,PdfReader.KillIndirect()它可以完成所说的内容,PdfWriter.AddDirectImageSimple(iTextSharp.text.Image, PRIndirectReference)然后您可以使用它来替换您所杀死的内容.
在伪C#代码中你会做:
var oldImage = PdfReader.GetPdfObject();
var newImage = YourImageCompressionFunction(oldImage);
PdfReader.KillIndirect(oldImage);
yourPdfWriter.AddDirectImageSimple(newImage, (PRIndirectReference)oldImage);
Run Code Online (Sandbox Code Playgroud)
将原始字节转换为.Net映像可能很棘手,我会将其留给您,或者您可以在此处搜索.马克在这里描述得很好.此外,从技术上讲,PDF没有DPI的概念,这主要适用于打印机.有关详细信息,请参阅此处的答案.
使用上面的方法,您的压缩算法实际上可以做两件事,物理缩小图像以及应用JPEG压缩.当您物理缩小图像并将其添加回来时,它将占用与原始图像相同的空间量,但使用的像素更少.这将为您提供您认为的DPI减少量.JPEG压缩说明了一切.
下面是一个针对iTextSharp 5.1.1.0的全功能C#2010 WinForms应用程序.它需要桌面上现有的名为"LargeImage.jpg"的JPEG,并从中创建一个新的PDF.然后它打开PDF,提取图像,将其物理缩小到原始大小的90%,应用85%JPEG压缩并将其写回PDF.有关更多说明,请参阅代码中的注释.代码需要更多的空/错误检查.还要查找NOTE需要扩展以处理其他情况的注释.
using System;
using System.Drawing;
using System.Drawing.Imaging;
using System.Drawing.Drawing2D;
using System.Windows.Forms;
using System.IO;
using iTextSharp.text;
using iTextSharp.text.pdf;
namespace WindowsFormsApplication1 {
public partial class Form1 : Form {
public Form1() {
InitializeComponent();
}
private void Form1_Load(object sender, EventArgs e) {
//Our working folder
string workingFolder = Environment.GetFolderPath(Environment.SpecialFolder.Desktop);
//Large image to add to sample PDF
string largeImage = Path.Combine(workingFolder, "LargeImage.jpg");
//Name of large PDF to create
string largePDF = Path.Combine(workingFolder, "Large.pdf");
//Name of compressed PDF to create
string smallPDF = Path.Combine(workingFolder, "Small.pdf");
//Create a sample PDF containing our large image, for demo purposes only, nothing special here
using (FileStream fs = new FileStream(largePDF, FileMode.Create, FileAccess.Write, FileShare.None)) {
using (Document doc = new Document()) {
using (PdfWriter writer = PdfWriter.GetInstance(doc, fs)) {
doc.Open();
iTextSharp.text.Image importImage = iTextSharp.text.Image.GetInstance(largeImage);
doc.SetPageSize(new iTextSharp.text.Rectangle(0, 0, importImage.Width, importImage.Height));
doc.SetMargins(0, 0, 0, 0);
doc.NewPage();
doc.Add(importImage);
doc.Close();
}
}
}
//Now we're going to open the above PDF and compress things
//Bind a reader to our large PDF
PdfReader reader = new PdfReader(largePDF);
//Create our output PDF
using (FileStream fs = new FileStream(smallPDF, FileMode.Create, FileAccess.Write, FileShare.None)) {
//Bind a stamper to the file and our reader
using (PdfStamper stamper = new PdfStamper(reader, fs)) {
//NOTE: This code only deals with page 1, you'd want to loop more for your code
//Get page 1
PdfDictionary page = reader.GetPageN(1);
//Get the xobject structure
PdfDictionary resources = (PdfDictionary)PdfReader.GetPdfObject(page.Get(PdfName.RESOURCES));
PdfDictionary xobject = (PdfDictionary)PdfReader.GetPdfObject(resources.Get(PdfName.XOBJECT));
if (xobject != null) {
PdfObject obj;
//Loop through each key
foreach (PdfName name in xobject.Keys) {
obj = xobject.Get(name);
if (obj.IsIndirect()) {
//Get the current key as a PDF object
PdfDictionary imgObject = (PdfDictionary)PdfReader.GetPdfObject(obj);
//See if its an image
if (imgObject.Get(PdfName.SUBTYPE).Equals(PdfName.IMAGE)) {
//NOTE: There's a bunch of different types of filters, I'm only handing the simplest one here which is basically raw JPG, you'll have to research others
if (imgObject.Get(PdfName.FILTER).Equals(PdfName.DCTDECODE)) {
//Get the raw bytes of the current image
byte[] oldBytes = PdfReader.GetStreamBytesRaw((PRStream)imgObject);
//Will hold bytes of the compressed image later
byte[] newBytes;
//Wrap a stream around our original image
using (MemoryStream sourceMS = new MemoryStream(oldBytes)) {
//Convert the bytes into a .Net image
using (System.Drawing.Image oldImage = Bitmap.FromStream(sourceMS)) {
//Shrink the image to 90% of the original
using (System.Drawing.Image newImage = ShrinkImage(oldImage, 0.9f)) {
//Convert the image to bytes using JPG at 85%
newBytes = ConvertImageToBytes(newImage, 85);
}
}
}
//Create a new iTextSharp image from our bytes
iTextSharp.text.Image compressedImage = iTextSharp.text.Image.GetInstance(newBytes);
//Kill off the old image
PdfReader.KillIndirect(obj);
//Add our image in its place
stamper.Writer.AddDirectImageSimple(compressedImage, (PRIndirectReference)obj);
}
}
}
}
}
}
}
this.Close();
}
//Standard image save code from MSDN, returns a byte array
private static byte[] ConvertImageToBytes(System.Drawing.Image image, long compressionLevel) {
if (compressionLevel < 0) {
compressionLevel = 0;
} else if (compressionLevel > 100) {
compressionLevel = 100;
}
ImageCodecInfo jgpEncoder = GetEncoder(ImageFormat.Jpeg);
System.Drawing.Imaging.Encoder myEncoder = System.Drawing.Imaging.Encoder.Quality;
EncoderParameters myEncoderParameters = new EncoderParameters(1);
EncoderParameter myEncoderParameter = new EncoderParameter(myEncoder, compressionLevel);
myEncoderParameters.Param[0] = myEncoderParameter;
using (MemoryStream ms = new MemoryStream()) {
image.Save(ms, jgpEncoder, myEncoderParameters);
return ms.ToArray();
}
}
//standard code from MSDN
private static ImageCodecInfo GetEncoder(ImageFormat format) {
ImageCodecInfo[] codecs = ImageCodecInfo.GetImageDecoders();
foreach (ImageCodecInfo codec in codecs) {
if (codec.FormatID == format.Guid) {
return codec;
}
}
return null;
}
//Standard high quality thumbnail generation from http://weblogs.asp.net/gunnarpeipman/archive/2009/04/02/resizing-images-without-loss-of-quality.aspx
private static System.Drawing.Image ShrinkImage(System.Drawing.Image sourceImage, float scaleFactor) {
int newWidth = Convert.ToInt32(sourceImage.Width * scaleFactor);
int newHeight = Convert.ToInt32(sourceImage.Height * scaleFactor);
var thumbnailBitmap = new Bitmap(newWidth, newHeight);
using (Graphics g = Graphics.FromImage(thumbnailBitmap)) {
g.CompositingQuality = CompositingQuality.HighQuality;
g.SmoothingMode = SmoothingMode.HighQuality;
g.InterpolationMode = InterpolationMode.HighQualityBicubic;
System.Drawing.Rectangle imageRectangle = new System.Drawing.Rectangle(0, 0, newWidth, newHeight);
g.DrawImage(sourceImage, imageRectangle);
}
return thumbnailBitmap;
}
}
}
Run Code Online (Sandbox Code Playgroud)
我不知道iTextSharp,但如果有任何改变,你必须重写一个PDF文件,因为它包含一个带有每个对象的确切文件位置的外部参照表(索引).这意味着即使添加或删除了一个字节,PDF也会损坏.
重新压缩图像的最佳选择是JBIG2,如果它们是B&W,或者是JPEG2000,否则,Jasper库将很乐意对JPEG2000码流进行编码,以便以您想要的任何质量放置到PDF文件中.
如果是我,我会从没有PDF库的代码中完成所有操作.只需找到所有图像(出现在(JPEG2000),(JBIG2)或(JPEG)之间stream和endstream之后的任何图像)将其拉出,用Jasper重新编码,然后再次粘贴并更新外部参照表.JPXDecodeJBIG2DecodeDCTDecode
要更新外部参照表,请找到每个对象的位置(开始00001 0 obj),然后只更新外部参照表中的新位置.这不是太多的工作,不是听起来.您可以使用单个正则表达式获取所有偏移量(我不是C#程序员,但在PHP中它就是这么简单.)
然后最后使用外部参照表开头的偏移量(它在文件中的位置)更新startxref标记的值.trailerxref
否则你将最终解码整个PDF并重写所有这些,这将是缓慢的,你可能会失去一些东西.
iText的创建者有一个关于如何在现有 PDF 中查找和替换图像的示例。这实际上是他书中的一小节摘录。由于它是在 Java 中,这里有一个简单的替换:
public void ReduceResolution(PdfReader reader, long quality) {
int n = reader.XrefSize;
for (int i = 0; i < n; i++) {
PdfObject obj = reader.GetPdfObject(i);
if (obj == null || !obj.IsStream()) {continue;}
PdfDictionary dict = (PdfDictionary)PdfReader.GetPdfObject(obj);
PdfName subType = (PdfName)PdfReader.GetPdfObject(
dict.Get(PdfName.SUBTYPE)
);
if (!PdfName.IMAGE.Equals(subType)) {continue;}
PRStream stream = (PRStream )obj;
try {
PdfImageObject image = new PdfImageObject(stream);
PdfName filter = (PdfName) image.Get(PdfName.FILTER);
if (
PdfName.JBIG2DECODE.Equals(filter)
|| PdfName.JPXDECODE.Equals(filter)
|| PdfName.CCITTFAXDECODE.Equals(filter)
|| PdfName.FLATEDECODE.Equals(filter)
) continue;
System.Drawing.Image img = image.GetDrawingImage();
if (img == null) continue;
var ll = image.GetImageBytesType();
int width = img.Width;
int height = img.Height;
using (System.Drawing.Bitmap dotnetImg =
new System.Drawing.Bitmap(img))
{
// set codec to jpeg type => jpeg index codec is "1"
System.Drawing.Imaging.ImageCodecInfo codec =
System.Drawing.Imaging.ImageCodecInfo.GetImageEncoders()[1];
// set parameters for image quality
System.Drawing.Imaging.EncoderParameters eParams =
new System.Drawing.Imaging.EncoderParameters(1);
eParams.Param[0] =
new System.Drawing.Imaging.EncoderParameter(
System.Drawing.Imaging.Encoder.Quality, quality
);
using (MemoryStream msImg = new MemoryStream()) {
dotnetImg.Save(msImg, codec, eParams);
msImg.Position = 0;
stream.SetData(msImg.ToArray());
stream.SetData(
msImg.ToArray(), false, PRStream.BEST_COMPRESSION
);
stream.Put(PdfName.TYPE, PdfName.XOBJECT);
stream.Put(PdfName.SUBTYPE, PdfName.IMAGE);
stream.Put(PdfName.FILTER, filter);
stream.Put(PdfName.FILTER, PdfName.DCTDECODE);
stream.Put(PdfName.WIDTH, new PdfNumber(width));
stream.Put(PdfName.HEIGHT, new PdfNumber(height));
stream.Put(PdfName.BITSPERCOMPONENT, new PdfNumber(8));
stream.Put(PdfName.COLORSPACE, PdfName.DEVICERGB);
}
}
}
catch {
// throw;
// iText[Sharp] can't handle all image types...
}
finally {
// may or may not help
reader.RemoveUnusedObjects();
}
}
}
Run Code Online (Sandbox Code Playgroud)
您会注意到它仅处理 JPEG。逻辑是相反的(而不是仅显式处理DCTDECODE/JPEG),因此您可以取消注释一些被忽略的图像类型并PdfImageObject在上面的代码中进行实验。特别是大部分FLATEDECODE图片(.bmp、.png、.gif)都是用PNG表示的(在源码的DecodeImageBytes方法中确认PdfImageObject)。据我所知,.NET 不支持 PNG 编码。这里和这里有一些参考资料支持这一点。您可以尝试一个独立的PNG优化可执行文件,但你也必须弄清楚如何设置PdfName.BITSPERCOMPONENT和PdfName.COLORSPACE在PRStream。
为完整起见,由于您的问题专门询问 PDF 压缩,以下是使用 iTextSharp 压缩 PDF 的方法:
PdfStamper stamper = new PdfStamper(
reader, YOUR-STREAM, PdfWriter.VERSION_1_5
);
stamper.Writer.CompressionLevel = 9;
int total = reader.NumberOfPages + 1;
for (int i = 1; i < total; i++) {
reader.SetPageContent(i, reader.GetPageContent(i));
}
stamper.SetFullCompression();
stamper.Close();
Run Code Online (Sandbox Code Playgroud)
您也可以尝试通过PdfSmartCopy运行 PDF 以减小 文件大小。它消除了多余的资源,但像调用RemoveUnusedObjects()的finally模块,它可能会或可能不会帮助。这将取决于 PDF 的创建方式。
IIRC iText[Sharp] 不能很好地处理JBIG2DECODE,所以@Alasdair 的建议看起来不错 - 如果你想花时间学习 Jasper 库并使用蛮力方法。
祝你好运。
编辑 - 2012-08-17,@Craig 评论:
使用上述ReduceResolution()方法压缩 jpeg 后保存 PDF :
一种。实例化一个PdfReader对象:
PdfReader reader = new PdfReader(pdf);
Run Code Online (Sandbox Code Playgroud)
湾 传递PdfReader给ReduceResolution()上面的方法。
C。将更改传递PdfReader给 a PdfStamper。这是使用 a 的一种方法MemoryStream:
// Save altered PDF. then you can pass the btye array to a database, etc
using (MemoryStream ms = new MemoryStream()) {
using (PdfStamper stamper = new PdfStamper(reader, ms)) {
}
return ms.ToArray();
}
Run Code Online (Sandbox Code Playgroud)
或者,Stream如果您不需要将 PDF 保存在内存中,则可以使用任何其他格式。例如,使用 aFileStream并直接保存到磁盘。