经过重新训练的keras模型评估在循环中调用时会泄漏内存

Ale*_*tic 5 keras

在我的应用程序中,我将重新使用在ImageNet上训练的现有MobileNet,并仅使用5个类来重新训练flowers数据集上的输出层。重新训练的模型将保存到磁盘。此后,在几次迭代中加载模型并执行评估,最终导致内存耗尽,整个应用程序崩溃。在进行了一些诊断之后,我意识到泄漏是来自model.evaluate()keras方法的。可以在独立的示例代码中重现该问题:

import os
import resource
import keras
import numpy as np

if __name__ == '__main__':
    init_alloc = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss

    for it in range(4):
        x_valid = np.random.uniform(0, 1, (64, 224, 224, 3)).astype(np.float32)
        y_valid = keras.utils.to_categorical(np.random.uniform(0, 5, (64, )).astype(np.int32), 5)

        start_alloc = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss

        model =  keras.models.load_model(os.path.abspath(os.path.join('.', 'mobilenet_flowers.h5')),
                                         custom_objects={'relu6': keras.applications.mobilenet.relu6,
                                                         'DepthwiseConv2D': keras.applications.mobilenet.DepthwiseConv2D})

        loss, _ = model.evaluate(x_valid, y_valid, batch_size=64, verbose=False)

        keras.backend.clear_session()
        del model

        end_alloc = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss

        print('Iteration %d:' % it)
        print('  Memory alloc before evaluate() is %7d kilobytes'   % start_alloc)
        print('  Memory alloc after  evaluate() is %7d kilobytes'   % end_alloc)
        print('  Memory alloc loss for evaluate is %7d kilobytes\n' % (end_alloc - start_alloc))

    exit_alloc = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss

    print('Memory alloc before loop is %7d kilobytes' % init_alloc)
    print('Memory alloc after  loop is %7d kilobytes' % exit_alloc)
    print('Memory alloc difference  is %7d kilobytes' % (exit_alloc - init_alloc))
Run Code Online (Sandbox Code Playgroud)

当我执行脚本时,将打印出以下内容:

Iteration 0:
  Memory alloc before evaluate() is  251864 kilobytes
  Memory alloc after  evaluate() is  901696 kilobytes
  Memory alloc loss for evaluate is  649832 kilobytes

Iteration 1:
  Memory alloc before evaluate() is  901696 kilobytes
  Memory alloc after  evaluate() is 1036780 kilobytes
  Memory alloc loss for evaluate is  135084 kilobytes

Iteration 2:
  Memory alloc before evaluate() is 1036780 kilobytes
  Memory alloc after  evaluate() is 1148692 kilobytes
  Memory alloc loss for evaluate is  111912 kilobytes

Iteration 3:
  Memory alloc before evaluate() is 1148692 kilobytes
  Memory alloc after  evaluate() is 1190804 kilobytes
  Memory alloc loss for evaluate is   42112 kilobytes

Memory alloc before loop is  138792 kilobytes
Memory alloc after  loop is 1190804 kilobytes
Memory alloc difference  is 1052012 kilobytes
Run Code Online (Sandbox Code Playgroud)

有什么建议可能有什么问题吗?在浏览完论坛之后,我尝试添加K.clear_session(),但是,如您在代码中所看到的那样,这没有帮助。该模型临时存储在https://ufile.io/rgaxs

有关我的环境的一些其他信息:

== cat /etc/issue ===============================================
Linux 4.10.0-38-generic #42~16.04.1-Ubuntu SMP Tue Oct 10 16:32:20 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
VERSION="16.04.3 LTS (Xenial Xerus)"
VERSION_ID="16.04"
VERSION_CODENAME=xenial

== are we in docker =============================================
No

== compiler =====================================================
c++ (Ubuntu 5.4.0-6ubuntu1~16.04.5) 5.4.0 20160609
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

== check pips ===================================================
numpy (1.12.1)
numpydoc (0.7.0)
protobuf (3.5.0)
tensorflow (1.4.0)
tensorflow-tensorboard (0.4.0rc3)

== check for virtualenv =========================================
False

== tensorflow import ============================================
tf.VERSION = 1.4.0
tf.GIT_VERSION = v1.4.0-rc1-11-g130a514
tf.COMPILER_VERSION = v1.4.0-rc1-11-g130a514
keras.VERSION = 2.0.9
Run Code Online (Sandbox Code Playgroud)