Tensorflow对象检测API RCNN在CPU上很慢:每分钟1帧

bw4*_*4sz 3 object-detection tensorflow

我正在使用来自tensorflow对象检测API的本地训练模型.我正在使用faster_rcnn_inception_resnet_v2_atrous_coco_11_06_2017检查站.我重新训练了一个1类模型并将其导出到SavedModel

python object_detection/export_inference_graph.py \
    --input_type image_tensor \
    --pipeline_config_path ${PIPELINE_CONFIG_PATH} \
    --trained_checkpoint_prefix /Users/Ben/Dropbox/GoogleCloud/Detection/train/model.ckpt-186\
    --output_directory /Users/Ben/Dropbox/GoogleCloud/Detection/SavedModel/
Run Code Online (Sandbox Code Playgroud)

虽然我知道还有其他较浅的模型,但报告的RCNN运行时间比我看到的快100多倍.任何人都可以在CPU上使用更快的RCNN运行时间来确认吗?我试图告诉我的代码是否存在问题,或者只是转移到较小的模型.

我正在抓住juypter笔记本的代码,几乎没有变化.我正在运行一个干净的virtualenv,只有安装的要求.

detection_predict.py

import numpy as np
import tensorflow as tf
from PIL import Image
import glob
from collections import defaultdict
from io import StringIO
from matplotlib import pyplot as plt
from PIL import Image
from object_detection.utils import label_map_util
from object_detection.utils import visualization_utils as vis_util
import os
import datetime

TEST_IMAGE_PATHS = glob.glob("/Users/Ben/Dropbox/GoogleCloud/Detection/images/validation/*.jpg")

# Size, in inches, of the output images. ?
IMAGE_SIZE = (12, 8)
NUM_CLASSES = 1

sess=tf.Session()
tf.saved_model.loader.load(sess,[tf.saved_model.tag_constants.SERVING], "/Users/ben/Dropbox/GoogleCloud/Detection/SavedModel/saved_model/")    

label_map = label_map_util.load_labelmap("label.pbtxt")
categories = label_map_util.convert_label_map_to_categories(label_map, max_num_classes=NUM_CLASSES, use_display_name=True)
category_index = label_map_util.create_category_index(categories)

def load_image_into_numpy_array(image):
    (im_width, im_height) = image.size
    npdata=np.array(image.getdata()).reshape((im_height, im_width, 3)).astype(np.uint8)   
    return npdata

# Definite input and output Tensors for sess.graph
image_tensor = sess.graph.get_tensor_by_name('image_tensor:0')

# Each box represents a part of the image where a particular object was detected.
detection_boxes = sess.graph.get_tensor_by_name('detection_boxes:0')

# Each score represent how level of confidence for each of the objects.
# Score is shown on the result image, together with the class label.
detection_scores = sess.graph.get_tensor_by_name('detection_scores:0')
detection_classes = sess.graph.get_tensor_by_name('detection_classes:0')
num_detections = sess.graph.get_tensor_by_name('num_detections:0')
for image_path in TEST_IMAGE_PATHS:

    image = Image.open(image_path)

    #basewidth = 300
    #wpercent = (basewidth/float(image.size[0]))
    #hsize = int((float(image.size[1])*float(wpercent)))
    #image = image.resize((basewidth,hsize), Image.ANTIALIAS)

    # the array based representation of the image will be used later in order to prepare the
    # result image with boxes and labels on it.
    image_np = load_image_into_numpy_array(image)

    # Expand dimensions since the model expects images to have shape: [1, None, None, 3]
    image_np_expanded = np.expand_dims(image_np, axis=0)
    # Actual detection.
    before = datetime.datetime.now()    
    (boxes, scores, classes, num) = sess.run([detection_boxes, detection_scores, detection_classes, num_detections],feed_dict={image_tensor: image_np_expanded})
    print("Prediction took : " + str(datetime.datetime.now() - before))  

    # Visualization of the results of a detection.
    vis_util.visualize_boxes_and_labels_on_image_array(image_np, np.squeeze(boxes), np.squeeze(classes).astype(np.int32), np.squeeze(scores), category_index, use_normalized_coordinates=True,line_thickness=8)
    plt.figure(figsize=IMAGE_SIZE)
    fn=os.path.basename(image_path)
    plt.imsave("/Users/Ben/Dropbox/GoogleCloud/Detection/validation/" + fn,image_np)
Run Code Online (Sandbox Code Playgroud)

产量

(detection) Bens-MacBook-Pro:Detection ben$ python detection_predict.py 

Prediction took : 0:00:51.475269
Prediction took : 0:00:43.955962
Run Code Online (Sandbox Code Playgroud)

调整图像大小没有任何区别(上面已注释掉).它们并不庞大(1280 X 720).

这是预期的吗?

系统信息

在此输入图像描述

最新的Tensorflow版本

Bens-MacBook-Pro:Detection ben$ python
Python 2.7.10 (default, Feb  7 2017, 00:08:15) 
[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.34)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> tf.__version__
'1.3.0'
Run Code Online (Sandbox Code Playgroud)

编辑#1

如果有人想知道,从冻结推理图预测没有任何区别.

detection_graph = tf.Graph()
with detection_graph.as_default():
    od_graph_def = tf.GraphDef()
    with tf.gfile.GFile("/Users/ben/Dropbox/GoogleCloud/Detection/SavedModel/frozen_inference_graph.pb", 'rb') as fid:
        serialized_graph = fid.read()
        od_graph_def.ParseFromString(serialized_graph)
        tf.import_graph_def(od_graph_def, name='')

(detection) Bens-MacBook-Pro:Detection ben$ python detection_predict.py 

Prediction took : 0:01:02.651046
Prediction took : 0:00:43.820992
Prediction took : 0:00:48.805432
Run Code Online (Sandbox Code Playgroud)

cProfile并不是特别有启发性

>>> stats.print_stats(20)
Thu Oct 19 14:55:47 2017    profiling_results

         40742812 function calls (38600273 primitive calls) in 173.800 seconds

   Ordered by: internal time
   List reduced from 4918 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        3  138.345   46.115  138.345   46.115 {_pywrap_tensorflow_internal.TF_Run}
977635/702731    2.852    0.000    9.200    0.000 /Users/ben/Documents/DeepMeerkat/training/Detection/detection/lib/python2.7/site-packages/google/protobuf/internal/python_message.py:469(init)
        3    2.597    0.866    2.597    0.866 {matplotlib._png.write_png}
    10719    2.111    0.000    2.114    0.000 {numpy.core.multiarray.array}
   363351    1.378    0.000    3.216    0.000 /Users/ben/Documents/DeepMeerkat/training/Detection/detection/lib/python2.7/site-packages/google/protobuf/internal/python_message.py:424(MakeSubMessageDefault)
  1045442    1.342    0.000    1.342    0.000 {_weakref.proxy}
562666/310637    1.317    0.000    6.182    0.000 /Users/ben/Documents/DeepMeerkat/training/Detection/detection/lib/python2.7/site-packages/google/protobuf/internal/python_message.py:1211(MergeFrom)
   931022    1.268    0.000    3.113    0.000 /Users/ben/Documents/DeepMeerkat/training/Detection/detection/lib/python2.7/site-packages/google/protobuf/internal/python_message.py:777(ListFields)
789671/269414    1.122    0.000    9.116    0.000 /Users/ben/Documents/DeepMeerkat/training/Detection/detection/lib/python2.7/site-packages/google/protobuf/internal/python_message.py:1008(ByteSize)
  1045442    0.882    0.000    2.498    0.000 /Users/ben/Documents/DeepMeerkat/training/Detection/detection/lib/python2.7/site-packages/google/protobuf/internal/python_message.py:1375(__init__)
3086143/3086140    0.662    0.000    0.756    0.000 {isinstance}
  1427511    0.656    0.000    0.782    0.000 /Users/ben/Documents/DeepMeerkat/training/Detection/detection/lib/python2.7/site-packages/google/protobuf/internal/python_message.py:762(_IsPresent)
   931092    0.649    0.000    0.879    0.000 {method 'sort' of 'list' objects}
1189105/899500    0.599    0.000    0.942    0.000 /Users/ben/Documents/DeepMeerkat/training/Detection/detection/lib/python2.7/site-packages/google/protobuf/internal/python_message.py:1330(Modified)
        1    0.537    0.537    0.537    0.537 {_pywrap_tensorflow_internal.TF_ExtendGraph}
276877/45671    0.480    0.000    8.315    0.000 /Users/ben/Documents/DeepMeerkat/training/Detection/detection/lib/python2.7/site-packages/google/protobuf/internal/python_message.py:1050(InternalSerialize)
  2602117    0.480    0.000    0.480    0.000 {method 'items' of 'dict' objects}
   459805    0.474    0.000    1.336    0.000 /Users/ben/Documents/DeepMeerkat/training/Detection/detection/lib/python2.7/site-packages/google/protobuf/internal/containers.py:551(__getitem__)
        1    0.434    0.434   16.605   16.605 /Users/ben/Documents/DeepMeerkat/training/Detection/detection/lib/python2.7/site-packages/tensorflow/python/framework/importer.py:156(import_graph_def)
  1297794    0.367    0.000    0.367    0.000 {method 'write' of '_io.BytesIO' objects}
Run Code Online (Sandbox Code Playgroud)

编辑#2

在努力做到这一点之后,我开始怀疑那些报告时间更快的人并没有严格记录他们的环境.一些GPU检查点适合那些感兴趣的人.

https://github.com/tensorflow/models/issues/1715

我打开这个问题,希望有人能为最大的模型报告他们的CPU时间,但我现在继续认为这是正确的,并转向较浅的模型.也许这将有助于其他人决定选择哪种模型.

bw4*_*4sz 7

希望这将有助于其他用户选择型号.这是我报告的OSX上3.1 Ghz CPU处理器的平均时间(更多信息如上).

faster_rcnn_inception_resnet_v2_atrous_coco:45秒/图像

faster_rcnn_resnet101_coco:16秒/图像

fcn_resnet101_coco:7秒/图像

ssd_inception_v2_coco:0.3秒/图像

ssd_mobilenet_v1_coco:0.3秒/图像

  • 我的堆栈与您几乎相同(2.8 ghz),并使用黑白 1024x768 图像训练我的数据集。使用faster_rcnn_resnet101,我得到大约16-20秒/图像。我已将模型部署在 CPU 非常高的 aws 实例上,可以将其缩短至 5 秒,但在没有 GPU 的情况下仍然无法接近实时 (2认同)