Tensorflow与CUBLAS_STATUS_ALLOC_FAILED崩溃

Question

Tensorflow与CUBLAS_STATUS_ALLOC_FAILED崩溃

我正在使用一个简单的MINST神经网络程序在Windows 10上运行tensorflow-gpu.当它试图运行时,它遇到CUBLAS_STATUS_ALLOC_FAILED错误.谷歌搜索没有发现任何东西.

I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX 970
major: 5 minor: 2 memoryClockRate (GHz) 1.253
pciBusID 0000:0f:00.0
Total memory: 4.00GiB
Free memory: 3.31GiB
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:906] DMA: 0
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:916] 0:   Y
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 970, pci bus id: 0000:0f:00.0)
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_blas.cc:372] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\stream.cc:1390] attempting to perform BLAS operation using StreamExecutor without BLAS support
Traceback (most recent call last):
  File "C:\Users\Anonymous\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\client\session.py", line 1021, in _do_call
    return fn(*args)
  File "C:\Users\Anonymous\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\client\session.py", line 1003, in _run_fn
    status, run_metadata)
  File "C:\Users\Anonymous\AppData\Local\Programs\Python\Python35\lib\contextlib.py", line 66, in __exit__
    next(self.gen)
  File "C:\Users\Anonymous\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 469, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InternalError: Blas SGEMM launch failed : a.shape=(100, 784), b.shape=(784, 256), m=100, n=256, k=784
         [[Node: MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](_recv_Placeholder_0/_7, Variable/read)]]
         [[Node: Mean/_15 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_35_Mean", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

Run Code Online (Sandbox Code Playgroud)

Answer 1

Sny*_*mpi 38

对于 TensorFlow 2.2，当遇到 CUBLAS_STATUS_ALLOC_FAILED 问题时，其他答案都不起作用。在https://www.tensorflow.org/guide/gpu上找到了解决方案：

import tensorflow as tf
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
    try:
        # Currently, memory growth needs to be the same across GPUs
        for gpu in gpus:
            tf.config.experimental.set_memory_growth(gpu, True)
        logical_gpus = tf.config.experimental.list_logical_devices('GPU')
        print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
    except RuntimeError as e:
        # Memory growth must be set before GPUs have been initialized
        print(e)

Run Code Online (Sandbox Code Playgroud)

我在进行任何进一步计算之前运行了这段代码，发现之前产生 CUBLAS 错误的相同代码现在在同一个会话中工作。上面的示例代码是一个特定示例，它设置了多个物理 GPU 的内存增长，但它也解决了内存扩展问题。

2020 年，这是我发现唯一可行的解决方案。 (5认同)
这适用于我的几个应用程序。Cuda 11.1，cudnn 8.0.5，GPU 计算 8.6 3080。 (2认同)
2021！这有效！ (2认同)
谢谢，一个相关问题，gpu 是否需要每次执行时设置“set_memory_growth”标志？ (2认同)
每次启动使用 TensorFlow GPU 的脚本时，我都会使用此代码。 (2认同)

Answer 2

Raf*_*jac 20

会话配置的"allow_growth"属性的位置现在似乎有所不同.它在这里解释:https://www.tensorflow.org/tutorials/using_gpu

所以目前你必须像这样设置它:

config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config, ...)

Run Code Online (Sandbox Code Playgroud)

不适用于 tf 2.1：tf.__version__ '2.1.0'，模块 'tensorflow' 没有属性 'ConfigProto' (3认同)
session = tf.Session(config=config, ...) ^ SyntaxError：位置参数遵循关键字参数解决方案不起作用。 (2认同)

Answer 3

小智 20

THIS CODE WORK FOR ME

Run Code Online (Sandbox Code Playgroud)

张量流>=2.0

import tensorflow as tf
config = tf.compat.v1.ConfigProto(gpu_options = 
                         tf.compat.v1.GPUOptions(per_process_gpu_memory_fraction=0.8)
# device_count = {'GPU': 1}
)
config.gpu_options.allow_growth = True
session = tf.compat.v1.Session(config=config)
tf.compat.v1.keras.backend.set_session(session)

Run Code Online (Sandbox Code Playgroud)

Answer 4

Spa*_*ear 8

我发现此解决方案有效

import tensorflow as tf
from keras.backend.tensorflow_backend import set_session

config = tf.ConfigProto(
    gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.8)
    # device_count = {'GPU': 1}
)
config.gpu_options.allow_growth = True
session = tf.Session(config=config)
set_session(session)

Run Code Online (Sandbox Code Playgroud)

归档时间：	9 年前
查看次数：	14859 次
最近记录：	6 年，1 月前