tf.keras model.predict 导致内存泄漏

Question

tf.keras model.predict 导致内存泄漏

use*_*067 18 python keras tensorflow google-colaboratory

在谷歌 Colab 工作。使用tf.kerastensorflow版本2.3.0我变得疯狂，因为我无法使用我训练过的模型来运行预测，model.predict因为它耗尽了CPU RAM。我已经能够用一个非常小的例子重现这个问题。

import numpy as np
import tensorflow as tf
from tensorflow.keras import backend as K
from tensorflow.keras.layers import Input,Conv2D, Activation

matrixSide = 512 #define a big enough matrix to give memory issues

inputL = Input([matrixSide,matrixSide,12]) #create a toy model
l1 = Conv2D(32,3,activation='relu',padding='same') (inputL) #120
l1 = Conv2D(64,1,activation='relu',padding='same')(l1)
l1 = Conv2D(64,3,activation='relu',padding='same')(l1)
l1 = Conv2D(1,1,padding='same')(l1)
l1 = Activation('linear')(l1)
model = Model(inputs= inputL,outputs = l1)


#run predictions
inImm = np.zeros((64,matrixSide,matrixSide,12))
for i in range (60):
  print(i)
  outImm = model.predict(inImm)
# K.clear_session() #somebody suggested it...

Run Code Online (Sandbox Code Playgroud)

基本上，当在 GPU 上工作时，它在前 4 次迭代中使用 3.0 GB 的 CPU RAM，然后上升到 7，然后到 10，然后崩溃，因为它耗尽了所有可用的 RAM！当在 CPU 上运行时，它会持续更多迭代，有时甚至会将其使用的 RAM 量从 9 GB 减少到 3 GB，但最终在 20 次左右的迭代后它仍然崩溃。

前面的示例（Keras 使用 tf.data.Dataset 预测循环内存泄漏，但不使用 numpy 数组）在使用tf.data但不使用 numpy 时存在类似的问题。有人在 github issues 上建议张量流 1.14 在每个循环中执行 a K.clear_session...但这没有帮助！

知道如何解决这个问题吗？

Answer 1

小智 16

我正在使用基于keras 文档的简单解决方案

对于适合一批的少量输入，建议直接使用call () 以加快执行速度，例如 model(x) 或 model(x, Training=False)

for filename in image_filenames:
  # read of data
  input = load_image(filename)

  # prediction
  output = model(input) # executes __call__() or call()

Run Code Online (Sandbox Code Playgroud)

使用__call__()或model(input)避免方法内部的内存泄漏predict，该方法每次执行时都会创建一个包含一个数据项的数据生成器，并且不会释放内存。

Answer 2

use*_*067 6

我已经找到了内存泄漏的修复方法。虽然K.clear_session()在我的例子中没有做任何事情，但在每次调用后添加垃圾收集_ = gc.collect()实际上可以解决问题！现在实际使用的内存是恒定的，我可以运行任意数量的预测。

Answer 3

Cod*_*ace 6

这是我将其作为 Bug 发布到 Tensorflow 后的理解。

将代码更改为；

in_imm = np.zeros((64,matrix_side,matrix_side,12))
for i in range (60):
  print(i)
  tensor = tf.convert_to_tensor(in_imm, dtype=tf.float32)
  out_imm = model.predict(tensor)

Run Code Online (Sandbox Code Playgroud)

在带有 numpy 输入的 for 循环中使用 tf.keras.Model.predict 会在每次迭代时创建一个新图，因为 numpy 数组是使用不同的签名创建的。将 numpy 数组转换为张量可保持相同的签名并避免创建新图。

谢谢你，我会尝试这个！顺便问一下，这些“图表”实际上是什么以及它们的行为方式是否有明确的解释？如果这是一个图形问题，为什么 K.clear_sessions() 不起作用（但 gc.collect() 起作用）？ (3认同)

Answer 4

Max*_* S. 5

我通过使用解决了这个问题K.clear_session()。首先，您需要先定义一个会话，然后才能清除它。其目的在此处和此处均有所解释。

config= tf.ConfigProto(log_device_placement=True) 
config.gpu_options.allow_growth = True
session = tf.Session(config=config)
K.set_session(session)

Run Code Online (Sandbox Code Playgroud)

首先，K.clear_session()在循环中使用会导致第一次预测后出现错误。在我看来， tf 失去了与model 的连接。因此，我在每次循环运行时都会创建一个新模型。这会对第一次多次运行的代码速度产生负面影响，但可以防止 RAM 存储的累积。

以下代码包含建议的改进：

import numpy as np
import tensorflow as tf
from tensorflow.keras import backend as K
from tensorflow.keras.layers import Input,Conv2D, Activation

matrixSide = 512 #define a big enough matrix to give memory issues

config = tf.ConfigProto(log_device_placement=True)
config.gpu_options.allow_growth = True
session = tf.Session(config=config)
K.set_session(session)

def create_model(matrixSide_v):
    inputL = Input([matrixSide_v,matrixSide_v,12]) #create a toy model
    l1 = Conv2D(32,3,activation='relu',padding='same') (inputL) #120
    l1 = Conv2D(64,1,activation='relu',padding='same')(l1)
    l1 = Conv2D(64,3,activation='relu',padding='same')(l1)
    l1 = Conv2D(1,1,padding='same')(l1)
    l1 = Activation('linear')(l1)
    c_model = Model(inputs= inputL,outputs = l1)
    return c_model

#run predictions
inImm = np.zeros((64,matrixSide,matrixSide,12))
for i in range(64):
    print(i)
    model = create_model(matrixSide)
    outImm = model.predict(inImm)
    K.clear_session()

Run Code Online (Sandbox Code Playgroud)

归档时间：	5 年，3 月前
查看次数：	14249 次
最近记录：	2 年，7 月前