pen*_*ope 5 neural-network deep-learning caffe conv-neural-network
我正在尝试在经过稍微修改的SegNet-basic模型Caffe中训练网络。
我知道Check failed: error == cudaSuccess (2 vs. 0) out of memory我得到的错误是由于我的GPU内存用完了。但是,令我困惑的是:
我的“旧”训练尝试效果很好。网络初始化并运行,内容如下:
Memory required for data: 1800929300(这是按批次大小计算的,因此这里是4x样本大小)7x7每层内核大小的64个过滤器。令我感到惊讶的是,我的“新”网络内存不足,并且由于减小了批处理大小,所以我不知道正在保留额外的内存是什么:
Memory required for data: 1175184180 (=样本数量)通过累加每层尺寸数获得的分层参数,该脚本可以计算出参数数量。
假设每个参数需要4个字节的内存,与新的相比,data_memory+num_param*4旧设置仍然需要更高的内存要求。memory_old = 1806602004 = 1.68GBmemory_new = 1181659956 = 1.10GB
我已经接受了可能需要在某处增加内存的问题,并且如果我找不到具有更多内存的GPU,则必须重新考虑我的新设置并降低输入的采样率,但是我实际上是在尝试了解哪里需要额外的内存,以及为什么我的新设置内存不足。
编辑:每个请求,这是每个网络的层尺寸以及通过它的数据的大小:
“旧”网络:
Top shape: 4 4 384 512 (3145728)
('conv1', (64, 4, 7, 7)) --> 4 64 384 512 (50331648)
('conv1_bn', (1, 64, 1, 1)) --> 4 64 384 512 (50331648)
('conv2', (64, 64, 7, 7)) --> 4 64 192 256 (12582912)
('conv2_bn', (1, 64, 1, 1)) --> 4 64 192 256 (12582912)
('conv3', (64, 64, 7, 7)) --> 4 64 96 128 (3145728)
('conv3_bn', (1, 64, 1, 1)) --> 4 64 96 128 (3145728)
('conv4', (64, 64, 7, 7)) --> 4 64 48 64 (786432)
('conv4_bn', (1, 64, 1, 1)) --> 4 64 48 64 (786432)
('conv_decode4', (64, 64, 7, 7)) --> 4 64 48 64 (786432)
('conv_decode4_bn', (1, 64, 1, 1)) --> 4 64 48 64 (786432)
('conv_decode3', (64, 64, 7, 7)) --> 4 64 96 128 (3145728)
('conv_decode3_bn', (1, 64, 1, 1)) --> 4 64 96 128 (3145728)
('conv_decode2', (64, 64, 7, 7)) --> 4 64 192 256 (12582912)
('conv_decode2_bn', (1, 64, 1, 1)) --> 4 64 192 256 (12582912)
('conv_decode1', (64, 64, 7, 7)) --> 4 64 384 512 (50331648)
('conv_decode1_bn', (1, 64, 1, 1)) --> 4 64 384 512 (50331648)
('conv_classifier', (3, 64, 1, 1))
Run Code Online (Sandbox Code Playgroud)
对于“新”网络,前几层是不同的,其余部分完全相同,只是批大小是1而不是4:
Top shape: 1 4 769 1025 (3152900)
('conv0', (64, 4, 7, 7)) --> 1 4 769 1025 (3152900)
('conv0_bn', (1, 64, 1, 1)) --> 1 64 769 1025 (50446400)
('conv1', (64, 4, 7, 7)) --> 1 64 384 512 (12582912)
('conv1_bn', (1, 64, 1, 1)) --> 1 64 384 512 (12582912)
('conv2', (64, 64, 7, 7)) --> 1 64 192 256 (3145728)
('conv2_bn', (1, 64, 1, 1)) --> 1 64 192 256 (3145728)
('conv3', (64, 64, 7, 7)) --> 1 64 96 128 (786432)
('conv3_bn', (1, 64, 1, 1)) --> 1 64 96 128 (786432)
('conv4', (64, 64, 7, 7)) --> 1 64 48 64 (196608)
('conv4_bn', (1, 64, 1, 1)) --> 1 64 48 64 (196608)
('conv_decode4', (64, 64, 7, 7)) --> 1 64 48 64 (196608)
('conv_decode4_bn', (1, 64, 1, 1)) --> 1 64 48 64 (196608)
('conv_decode3', (64, 64, 7, 7)) --> 1 64 96 128 (786432)
('conv_decode3_bn', (1, 64, 1, 1)) --> 1 64 96 128 (786432)
('conv_decode2', (64, 64, 7, 7)) --> 1 64 192 256 (3145728)
('conv_decode2_bn', (1, 64, 1, 1)) --> 1 64 192 256 (3145728)
('conv_decode1', (64, 64, 7, 7)) --> 1 64 384 512 (12582912)
('conv_decode1_bn', (1, 64, 1, 1)) --> 1 64 384 512 (12582912)
('conv_classifier', (3, 64, 1, 1))
Run Code Online (Sandbox Code Playgroud)
这将跳过池化和上采样层。这是train.prototxt“新”网络的。旧网络没有conv0,conv0_bn和层pool0,而其他层相同。“老字号”的网络也batch_size设定为4替代1。
EDIT2:每个请求,甚至更多信息:
769x1025,因此请始终4x769x1025输入。out of memory在网络初始化之后我得到了。没有一个迭代运行。| 归档时间: |
|
| 查看次数: |
278 次 |
| 最近记录: |