如何加快复杂的图像处理速度?

6 php image converter imagemagick imagemagick-convert

每个用户都可以上传100张TIFF(黑白)图像.

该过程需要:

  1. 转换tifjpg.

  2. 将图像大小调整为xx.

  3. 将图像裁剪为200px.

  4. 添加文本水印.

这是我的PHP代码:

move_uploaded_file($image_temp,$destination_folder.$image_name);

$image_name_only = strtolower($image_info["filename"]);

$name=$destination_folder.$image_name_only.".jpg";
$thumb=$destination_folder."thumb_".$image_name_only.".jpg";
$exec = '"C:\Program Files\ImageMagick-6.9.0-Q16\convert.exe" '.$destination_folder.$image_name. ' '.$name.' 2>&1';
exec($exec, $exec_output, $exec_retval);                

$exec = '"C:\Program Files\ImageMagick-6.9.0-Q16\convert.exe" '.$name. ' -resize 1024x  '.$name;
exec($exec, $exec_output, $exec_retval);

$exec = '"C:\Program Files\ImageMagick-6.9.0-Q16\convert.exe" '.$name. ' -thumbnail 200x200!  '.$thumb;
exec($exec, $exec_output, $exec_retval);

$exec = '"C:\Program Files\ImageMagick-6.9.0-Q16\convert.exe" '.$name. "  -background White  label:?.?12355  -append  ".$name;
exec($exec, $exec_output, $exec_retval);
Run Code Online (Sandbox Code Playgroud)

这段代码有效.但每张图像的平均处理时间为1秒.因此,对于100张图像,它可能需要大约100秒.

如何加快整个过程(转换,调整大小,裁剪,水印)?

编辑

我有一个服务器G8:Ram:32G,CPU:Intel Xeon E5-2650(4个进程)

版本:ImageMagick 6.9.0-3 Q16 x64

特点:OpenMP的

convert logo: -resize 500% -bench 10 1.png

 Performance[1]: 10i 0.770ips 1.000e 28.735u 0:12.992
 Performance[2]: 10i 0.893ips 0.537e 26.848u 0:11.198
 Performance[3]: 10i 0.851ips 0.525e 27.285u 0:11.756
 Performance[4]: 10i 0.914ips 0.543e 26.489u 0:10.941
 Performance[5]: 10i 0.967ips 0.557e 25.803u 0:10.341
 Performance[6]: 10i 0.797ips 0.509e 27.737u 0:12.554
 Performance[7]: 10i 0.963ips 0.556e 25.912u 0:10.389
 Performance[8]: 10i 0.863ips 0.529e 26.707u 0:11.586
Run Code Online (Sandbox Code Playgroud)

资源限制:

宽度:100MP;高度:100MP;面积:17.16GP;内存:7.9908GiB;地图:15.982GiB;磁盘:无限制;文件:1536;线程:8;节流:0;时间:无限制

Kur*_*fle 39

0. Two approaches

Basically, this challenge can be tackled in two different ways, or a combination of the two:

  1. Construct your commands as clever as possible.
  2. Trade speed-up gains for quality losses.

The next few sections discuss the both approaches.

1. Check which ImageMagick you've got: 'Q8', 'Q16', 'Q32' or 'Q64'?

First, check for your exact ImageMagick version and run:

convert -version
Run Code Online (Sandbox Code Playgroud)

In case your ImageMagick has a Q16 (or even Q32 or Q64, which is possible, but overkill!) in its version string: This means, all of ImageMagick's internal functions treat all images as having 16 bit (or 32 or 64 bit) channel depths. This gives you a better quality in image processing. But it also requires double memory as compared to Q8. So at the same time it means a performance degradation.

因此:您可以通过切换到Q8-build 来测试您将实现的性能优势.(这Q是ImageMagick构建支持的"量子深度"的符号.)

但是,您可以Q8通过质量损失支付可能的性能提升.只是检查你达到什么样的加快Q8Q16,而且你遭受什么损失质量.然后决定你是否能忍受这些弊端......

在任何情况下,Q16每个图像将使用两倍的RAM进行处理,Q32并将再次使用两倍的量Q16.这与输入文件中看到的实际每像素位数无关.保存时,16位图像文件也会比8位图像文件消耗更多磁盘空间.

Q16Q32需要更多内存,你必须确保你有足够的内存.因为超出你的身体记忆会是非常坏的消息.如果较大Q的进程交换到磁盘,性能将直线下降.甲1074 x 768像素图像(width x height)将需要虚拟存储器的以下量,取决于量子深度:

Quantum                   Virtual Memory
  Depth    (consumed by 1 image 1024x768)
-------    ------------------------------  
      8         3.840 kiB  (=~  3,75 MiB)
     16         7.680 kiB  (=~  7,50 MiB)
     32        15.360 kiB  (=~ 14,00 MiB)
Run Code Online (Sandbox Code Playgroud)

还要记住,一些"优化的"处理流水线(见下文)需要在虚拟内存中保留多个映像副本!一旦可用RAM无法满足虚拟内存,系统将开始交换并从磁盘中声明"内存".在那种情况下,所有聪明的命令管道优化当然都没有了,并且开始转向反向.

ImageMagick's birthday was in the aera when CPUs could handle only 1 bit at a time. That was decades ago. Since then CPU architecture has changed a lot. 16-bit operations used to take twice as long as 8-bit operations, or even longer. Then 16-bit processors arrived. 16-bit ops became standard. CPUs were optimised for 16-bit: Suddenly some 8-bit operations could take even longer than 16-bit equivalents.

Nowadays, 64bit CPUs are common. So the Q8 vs. Q16 vs. Q32 argument in real terms may even be void. Who knows? I'm not aware of any serious benchmarking about this. It would be interesting if someone (with really deep knowhow about CPUs and about benchmarking real world programs) would run with such a project one day.

是的,我看到你Q16在Windows 上使用.但为了完整起见,我仍然想提及它...将来会有其他用户阅读这个问题并给出答案.

很可能,由于您的输入TIFF仅为黑色+白色,因此Q8构建的图像质量输出将足以满足您的工作流程.(我只是不知道它是否也会明显更快:这在很大程度上还取决于你运行它的硬件资源......)

此外,如果您的安装运动支持HDRI(高动态分辨率图像),这也可能会导致一些速度惩罚.谁知道?因此,使用配置选项构建IM --disable-hdri --quantum-depth 8可能会也可能不会导致速度提升.没有人以严肃的方式对此进行过测试......我们唯一知道的就是:这些选项会降低图像质量.然而,大多数人甚至都不会注意到这一点,除非他们采取非常近距离的外观并进行直接的逐个图像比较......

 

2.检查ImageMagick的功能

接下来,检查您的ImageMagick安装是否附带OpenCL和/或OpenMP支持:

convert -list configure | grep FEATURES
Run Code Online (Sandbox Code Playgroud)

如果它(像我的),你应该看到这样的事情:

FEATURES      DPC HDRI OpenCL OpenMP Modules
Run Code Online (Sandbox Code Playgroud)

OpenCL (for C omputing L anguage) utilizes ImageMagick's parallel computing features (if compiled-in). This will make use of your computer's GPU additionally to the CPU for image processing operations.

OpenMP (for M ulti-P rocessing) does something similar: it allows ImageMagick to execute in parallel on all the cores of your system. So if you have a quad-core system, and resize an image, the resizing happens on 4 cores (or even 8 if you have hyperthreading).

The command

convert -version 
Run Code Online (Sandbox Code Playgroud)

prints some basic info about supported features. If OpenCL/OpenMP are available, you will see one of them (or both) in the output.

If none of the two show up: look into getting the most recent version of ImageMagick that has OpenCL and/or OpenMP support compiled in.

If you build the package yourself from the sources, make sure OpenCL/OpenMP are used. Do this by including the appropriate parameters into your 'configure' step:

./configure  [...other options-]  --enable-openmp  --enable-opencl
Run Code Online (Sandbox Code Playgroud)

ImageMagick's documentation about OpenMP and OpenCL is here:

  • Parallel Execution With OpenMP. Read it carefully. Because OpenMP is not a silver bullet, and it does not work under all circumstances...
  • Parallel Execution With OpenCL. The same as above applies here. Additionally, not all ImageMagick operations are OpenCL-enabled. The link here has a list of those which are. -resize is one of them.

Hints and instructions to build ImageMagick from sources and configure the build, explaining various options, are here:

This page also includes a short discussion of the --with-quantum-depth configure option.

3. Benchmark your ImageMagick

You can now also use the builtin -bench option to make ImageMagick run a benchmark for your command. For example:

convert logo: -resize 500% -bench 10 logo.png

  [....]
  Performance[4]: 10i 1.489ips 1.000e 6.420u 0:06.510
Run Code Online (Sandbox Code Playgroud)

Above command with -resize 500% tells ImageMagick to run the convert command and scale the built-in IM logo: image by 500% in each direction. The -bench 10 part tells it to run that same command 10 times in a loop and then print the performance results:

  • Since I have OpenMP enabled, I have 4 threads (Performance[4]:).
  • It reports that it ran 10 iterations (10i).
  • The speed was nearly 1.5 iterations per second (1.489ips).
  • Total user-alotted time was 6.420 seconds.

If your result includes Performance[1]:, and only one line, then your system does not have OpenMP enabled. (You may be able to switch it on, if your build does support it: run convert -limit thread 2.)

4. Tweak your ImageMagick's resource limits

Find out how your system's ImageMagick is set up regarding resource limits. Use this command:

identify -list resource
  File       Area     Memory     Map       Disk    Thread         Time
  --------------------------------------------------------------------
   384    8.590GB       4GiB    8GiB  unlimited         4    unlimited

Above shows my current system's settings (not the defaults -- I did tweak them in the past). The numbers are the maximum amount of each resource ImageMagick will use. You can use each of the keywords in the column headers to pimp your system. For this, use convert -limit <resource> <number> to set it to a new limit.

Maybe your result looks more like this:

identify -list resource
  File       Area     Memory     Map       Disk    Thread         Time
  --------------------------------------------------------------------
   192    4.295GB       2GiB    4GiB  unlimited         1    unlimited
  • The files defines the max concurrently opened files which ImageMagick can use.
  • The memory, map, area and disk resource limits are defined in Bytes. For setting them to different values you can use SI prefixes, .e.g 500MB).

When you do have OpenMP for ImageMagick on your system, you can run.

convert -limit thread 2
Run Code Online (Sandbox Code Playgroud)

This enable 2 parallel threads as a first step. Then re-run the benchmark and see if it really makes a difference, and if so how much. After that you could set the limit to 4 or even 8 and repeat the excercise....

5. Use Magick Pixel Cache (MPC) and/or Magick Persistent Registry (MPR)

Finally, you can experiment with a special internal format of ImageMagick's pixel cache. This format is called MPC (Magick Pixel Cache). It only exists in memory.

When MPC is created, the processed input image is kept in RAM as an uncompressed raster format. So basically, MPC is the native in-memory uncompressed file format of ImageMagick. It is simply a direct memory dump to disk. A read is a fast memory map from disk to memory as needed (similar to memory page swapping). But no image decoding is needed.

(More technical details: MPC as a format is not portable. It also isn't suitable as a long-term archive format. Its only suitability is as an intermediate format for high-performance image processing. It requires two files to support one image.)

If you still want to save this format to disk, be aware of this:

  • Image attributes are written to a file with the extension .mpc.
  • Image pixels are written to a file with the extension .cache.

Its main advantage is experienced when...

  1. ...processing very large images, or when
  2. ...applying several operations on one and the same image in "opertion pipelines".

MPC was designed especially for workflow patterns which match the criteria "read many times, write once".

Some people say that for such operations the performance improves here, but I have no personal experience with it.

Convert your base picture to MPC first:

convert input.jpeg input.mpc
Run Code Online (Sandbox Code Playgroud)

and only then run:

convert input.mpc [...your long-long-long list of crops and operations...]
Run Code Online (Sandbox Code Playgroud)

Then see if this saves you significantly on time.

Most likely you can use this MPC format even "inline" (using the special mpc: notation, see below).

The MPR format (memory persistent register) does something similar. It reads the image into a named memory register. Your process pipeline can also read the image again from that register, should it need to access it multiple times. The image persists in the register the current command pipeline exits.

But I've never applied this technique to a real world problem, so I can't say how it works out in real life.

6. Construct a suitable IM processing pipeline to do all tasks in one go

As you describe your process, it is composed of 4 distinguished steps:

  1. Convert a TIFF to a JPEG.
  2. Resize the JPEG image to xx (?? what value ??)
  3. Crop the JPEG to 200px.
  4. Add a text watermark.

Please tell if I understand correctly your intentions from reading your code snippets:

  • You have 1 input file, a TIFF.
  • You want 2 final output files:
    1. 1 thumbnail JPEG, sized 200x200 pixels;
    2. 1 labelled JPEG, with a width of 1024 pixels (height keeping aspect ratio of input TIFF);
    3. 1 (unlabelled) JPEG is only an intermediate file which you do not really want to keep.

Basically, each step uses its own command -- 4 different commands in total. This can be sped up considerably by using a single command pipeline which performs all the steps on its own.

Moreover, you seem to not really need to keep the unlabelled JPEG as an end result -- yet your one command to generate it as an intermediate temporary file saves it to disk. We can try to skip this step altogether then, and try to achieve the final result without this extra write to disk.

There are different approaches possible to this change. I'll show you (and other readers) only one for now -- and only for the CLI, not for PHP. I'm not a PHP guy -- it's your own job to 'translate' my CLI method into appropriate PHP calls.

(But by all means: please test with my commands first, really using the CLI, to see if the effort is worth while translating the approach to PHP!)

But please make first sure that you really understand the architecture and structure of more complex ImageMagick's command lines! For this goal, please refer to this other answer of mine:

Your 4 steps translate into the following individual ImageMagick commands:

convert image.tiff image.jpg

convert image.jpg -resize 1024x image-1024.jpg

convert image-1024.jpg -thumbnail 200x200 image-thumb.jpg

convert -background white image-1024.jpg label:12345 -append image-labelled.jpg
Run Code Online (Sandbox Code Playgroud)

Now to transform this workflow into one single pipeline command... The following command does this. It should execute faster (regardless of what your results are when following my above steps 0.--4.):

convert image.tiff                                                             \
 -respect-parentheses                                                          \
 +write mpr:XY                                                                 \
  \( mpr:XY                                       +write image-1024.jpg \)     \
  \( mpr:XY -thumbnail 200x200                    +write image-thumb.jpg \)    \
  \( mpr:XY -background white label:12345 -append +write image-labelled.jpg \) \
  null:
Run Code Online (Sandbox Code Playgroud)

Explanations:

  • -respect-parentheses : required to really make independent from each other the sub-commands executed inside the \( .... \) parentheses.
  • +write mpr:XY : used to write the input file to an MPR memory register. XY is just a label (you can use anything), needed to later re-call the same image.
  • +write image-1024.jpg : writes result of subcommand executed inside the first parentheses pair to disk.
  • +write image-thumb.jpg : writes result of subcommand executed inside the second parentheses pair to disk.
  • +write image-labelled.jpg : writes result of subcommand executed inside the third parentheses pair to disk.
  • null: : terminates the command pipeline. Required because we otherwise would end with the last subcommand's closing parenthesis.

7. Benchmarking 4 individual commands vs. the single pipeline

In order to get a rough feeling about my suggestion, I did run the commands below.

The first one runs the sequence of the 4 individual commands 100 times (and saves all resulting images under different file names).

time for i in $(seq -w 1 100); do
   convert image.tiff                                                          \
                                               image-indiv-run-${i}.jpg
   convert image-indiv-run-${i}.jpg -sample 1024x                              \
                                               image-1024-indiv-run-${i}.jpg
   convert image-1024-indiv-run-${i}.jpg -thumbnail 200x200                    \
                                               image-thumb-indiv-run-${i}.jpg
   convert -background white image-1024-indiv-run-${i}.jpg label:12345 -append \
                                               image-labelled-indiv-run-${i}.jpg
   echo "DONE: run indiv $i ..."
done
Run Code Online (Sandbox Code Playgroud)

My result for 4 individual commands (repeated 100 times!) is this:

real  0m49.165s
user  0m39.004s
sys   0m6.661s
Run Code Online (Sandbox Code Playgroud)

The second command times the single pipeline:

time for i in $(seq -w 1 100); do
    convert image.tiff                                        \
     -respect-parentheses                                     \
     +write mpr:XY                                            \
      \( mpr:XY -resize 1024x                                 \
                +write image-1024-pipel-run-${i}.jpg     \)   \
      \( mpr:XY -thumbnail 200x200                            \
                +write image-thumb-pipel-run-${i}.jpg    \)   \
      \( mpr:XY -resize 1024x                                 \
                -background white label:12345 -append         \
                +write image-labelled-pipel-run-${i}.jpg \)   \
     null:
   echo "DONE: run pipeline $i ..."
done
Run Code Online (Sandbox Code Playgroud)

The result for single pipeline (repeated 100 times!) is this:

real   0m29.128s
user   0m28.450s
sys    0m2.897s
Run Code Online (Sandbox Code Playgroud)

As you can see, the single pipeline is about 40% faster than the 4 individual commands!

Now you can also invest in multi-CPU, much RAM, fast SSD hardware to speed things up even more :-)

But first translate this CLI approach into PHP code...


There are a few more things to be said about this topic. But my time runs out for now. I'll probably return to this answer in a few days and update it some more...


Update: I had to update this answer with new numbers for the benchmarking: initially I had forgotten to include the -resize 1024x operation (stupid me!) into the pipelined version. Having included it, the performance gain is still there, but not as big any more.


8. Use -clone 0 to copy image within memory

Here is another alternative to try instead of the mpr: approach with a named memory register as suggested above.

It uses (again within 'side processing inside parentheses') the -clone 0 operation. The way this works is this:

  1. convert reads the input TIFF from disk once and loads it into memory.
  2. Each -clone 0 operator makes a copy of the first loaded image (because it has index 0 in the current image stack).
  3. Each "within-parenthesis" sub-pipeline of the total command pipeline performs some operation on the clone.
  4. Each +write operation saves the respective result to disk.

So here is the command to benchmark this:

time for i in $(seq -w 1 100); do
    convert image.tiff                                         \
     -respect-parentheses                                      \
      \( -clone 0 -thumbnail 200x200                           \
                  +write image-thumb-pipel-run-${i}.jpg    \)  \
      \( -clone 0 -resize 1024x                                \
                  -background white label:12345 -append        \
                  +write image-labelled-pipel-run-${i}.jpg \)  \
     null:
   echo "DONE: run pipeline $i ..."
done
Run Code Online (Sandbox Code Playgroud)

My result:

real   0m19.432s
user   0m18.214s
sys    0m1.897s
Run Code Online (Sandbox Code Playgroud)

To my surprise, this is faster than the version which used mpr: !

9. Use -scale or -sample instead of -resize

This alternative will most likely speed up your resizing sub-operation. But it will likely lead to a somewhat worse image quality (you'll have to verify, if this difference is noticeable).

For some background info about the difference between -resize, -sample and -scale see the following answer:

I tried it too:

time for i in $(seq -w 1 100); do
    convert image.tiff                                         \
     -respect-parentheses                                      \
      \( -clone 0 -thumbnail 200x200                           \
                  +write image-thumb-pipel-run-${i}.jpg    \)  \
      \( -clone 0 -scale 1024x                                 \
                  -background white label:12345 -append        \
                  +write image-labelled-pipel-run-${i}.jpg \)  \
     null:
   echo "DONE: run pipeline $i ..."
done
Run Code Online (Sandbox Code Playgroud)

My result:

real   0m16.551s
user   0m16.124s
sys    0m1.567s
Run Code Online (Sandbox Code Playgroud)

This is the fastest result so far (I combined it with the +clone variant).

Of course, this modification can also be applied to your initial method running 4 different commands.

10. Emulate the Q8 build by adding -depth 8 to the commands.

I did not actually run and measure this, but the complete command would be.

time for i in $(seq -w 1 100); do
    convert image.tiff                                            \
     -respect-parentheses                                         \
      \( -clone 0 -thumbnail 200x200 -depth 8                     \
                  +write d08-image-thumb-pipel-run-${i}.jpg    \) \
      \( -clone 0 -scale 1024x       -depth 8                     \
                  -background white label:12345 -append           \
                  +write d08-image-labelled-pipel-run-${i}.jpg \) \
     null:
   echo "DONE: run pipeline $i ..."
done
Run Code Online (Sandbox Code Playgroud)

This modification is also applicable to your initial "I run 4 different commands"-method.

11. Combine it with GNU parallel, as suggested by Mark Setchell

This of course is only applicable and reasonable for you, if your overall work process allows for such parallelization.

For my little benchmark testing it is applicable. For your web service, it may be that you know of only one job at a time...

time for i in $(seq -w 1 100); do                                 \
    cat <<EOF
    convert image.tiff                                            \
      \( -clone 0 -scale  1024x         -depth 8                  \
                  -background white label:12345 -append           \
                  +write d08-image-labelled-pipel-run-${i}.jpg \) \
      \( -clone 0 -thumbnail 200x200  -depth 8                    \
                  +write d08-image-thumb-pipel-run-${i}.jpg   \)  \
       null:
    echo "DONE: run pipeline $i ..."
EOF
done | parallel --will-cite
Run Code Online (Sandbox Code Playgroud)

Results:

real  0m6.806s
user  0m37.582s
sys   0m6.642s
Run Code Online (Sandbox Code Playgroud)

The apparent contradiction between user and real time can be explained: the user time represents the sum of all time ticks which where clocked on 8 different CPU cores.

From the point of view of the user looking onto his watch, it was much faster: less than 10 seconds.

12. Summary

Pick your own preferences -- combine different methods:

  1. Some speedup can be gained (with identical image quality as currently) by constructing a more clever command pipeline. Avoid running various commands (where each convert leads to a new process, and has to read its input from disk). Pack all image manipulations into one single process. Make use of the "parenthesized side processing". Make use of -clone or mbr: or mbc: or even combine each of these.

  2. Some speedups can be additionally be gained by trading image quality with performance: Some of your choices are:

    1. -depth 8 (has to be declared on the OP's system) vs. -depth 16 (the default on the OP's system)
    2. -resize 1024 vs. -sample 1024x vs. -scale 1024x
  3. Make use of GNU parallel if your workflow permits this.


Mar*_*ell 5

一如既往,@KurtPfeifle 提供了一个非常合理和解释性的答案,他所说的一切都是可靠的建议,你最好仔细聆听和遵循。

虽然还有更多可以做的事情,但它超出了我可以添加为评论的范围,所以我将其作为另一个答案,尽管它只是对库尔特的增强......

我不知道 Kurt 使用的输入图像大小是多少,所以我制作了一张 3000x2000 的图像,并将我的运行时间与他的进行比较,看看它们是否具有可比性,因为我们有不同的硬件。在我的机器上,各个命令的运行时间为 42 秒,而流水线命令的运行时间为 36 秒,因此我猜我的图像大小和硬件大致相似。

然后我使用 GNU Parallel 并行运行作业 - 我认为您会在 Xeon 上从中受益匪浅。这就是我所做的......

time for i in $(seq -w 1 100); do
    cat <<EOF
    convert image.tiff                                        \
     -respect-parentheses                                     \
     +write mpr:XY                                            \
      \( mpr:XY -resize 1024x                                 \
                +write image-1024-pipel-run-${i}.jpg     \)   \
      \( mpr:XY -thumbnail 200x200                            \
                +write image-thumb-pipel-run-${i}.jpg    \)   \
      \( mpr:XY -background white label:12345 -append         \
                +write image-labelled-pipel-run-${i}.jpg \)   \
     null:
   echo "DONE: run pipeline $i ..."
EOF
done | parallel
Run Code Online (Sandbox Code Playgroud)

正如您所看到的,我所做的只是回显需要运行到 stdout 的命令并将它们通过管道传输到 GNU Parallel。就这样运行,在我的机器上只需要10秒。

我还尝试使用 来模仿功能ffmpeg,并提出了这个,这在我的测试图像上看起来非常相似 - 您的里程可能会有所不同。

#!/bin/bash
for i in $(seq -w 1 100); do
    echo ffmpeg -y -loglevel panic -i image.tif ff-$i.jpg 
    echo ffmpeg -y -loglevel panic -i image.tif -vf scale=1024:682 ff-$i-1024.jpg
    echo ffmpeg -y -loglevel panic -i image.tif -vf scale=200:200 ff-$i-200.jpg
done | parallel
Run Code Online (Sandbox Code Playgroud)

在我的 iMac 上,使用 3000x2000 image.tif 输入文件运行 7 秒。

libturbo-jpeghomebrew.