如何使用截断的SVD减少完全连接("InnerProduct")层

Question

如何使用截断的SVD减少完全连接("InnerProduct")层

Sha*_*hai 6 machine-learning linear-algebra neural-network deep-learning caffe

在Girshick,R Fast-RCNN(ICCV 2015),"3.1截断SVD以便更快检测"一节中,作者提出使用SVD技巧来减小完全连接层的大小和计算时间.

给定一个训练有素的模型(deploy.prototxt和weights.caffemodel),我如何使用这个技巧用截断的模型替换完全连接的层？

Answer 1

一些线性代数背景
奇异值分解(SVD)是将任何矩阵分解W为三个矩阵:

W = U S V*

Run Code Online (Sandbox Code Playgroud)

其中U和V是正交正规矩阵,并且S对角线上的元素的幅度逐渐减小.SVD的一个有趣特性是它允许W使用较低秩矩阵轻松逼近:假设您截断S只有其k前导元素(而不是对角线上的所有元素)然后

W_app = U S_trunc V*

Run Code Online (Sandbox Code Playgroud)

是一个等级k近似值W.

使用SVD逼近完全连接的层
假设我们有一个deploy_full.prototxt具有完全连接层的模型

# ... some layers here
layer {
  name: "fc_orig"
  type: "InnerProduct"
  bottom: "in"
  top: "out"
  inner_product_param {
    num_output: 1000
    # more params...
  }
  # some more...
}
# more layers...

Run Code Online (Sandbox Code Playgroud)

此外,我们已经trained_weights_full.caffemodel- 训练过的deploy_full.prototxt模型参数.

复制deploy_full.protoxt到deploy_svd.protoxt和您选择的编辑器打开它.用这两层替换完全连接的层:

layer {
  name: "fc_svd_U"
  type: "InnerProduct"
  bottom: "in" # same input
  top: "svd_interim"
  inner_product_param {
    num_output: 20  # approximate with k = 20 rank matrix
    bias_term: false
    # more params...
  }
  # some more...
}
# NO activation layer here!
layer {
  name: "fc_svd_V"
  type: "InnerProduct"
  bottom: "svd_interim"
  top: "out"   # same output
  inner_product_param {
    num_output: 1000  # original number of outputs
    # more params...
  }
  # some more...
}

Run Code Online (Sandbox Code Playgroud)

在python中,有点网手术:

import caffe
import numpy as np

orig_net = caffe.Net('deploy_full.prototxt', 'trained_weights_full.caffemodel', caffe.TEST)
svd_net = caffe.Net('deploy_svd.prototxt', 'trained_weights_full.caffemodel', caffe.TEST)
# get the original weight matrix
W = np.array( orig_net.params['fc_orig'][0].data )
# SVD decomposition
k = 20 # same as num_ouput of fc_svd_U
U, s, V = np.linalg.svd(W)
S = np.zeros((U.shape[0], k), dtype='f4')
S[:k,:k] = s[:k]  # taking only leading k singular values
# assign weight to svd net
svd_net.params['fc_svd_U'][0].data[...] = np.dot(U,S)
svd_net.params['fc_svd_V'][0].data[...] = V[:k,:]
svd_net.params['fc_svd_V'][1].data[...] = orig_net.params['fc_orig'][1].data # same bias
# save the new weights
svd_net.save('trained_weights_svd.caffemodel')

Run Code Online (Sandbox Code Playgroud)

现在我们已经deploy_svd.prototxt得到trained_weights_svd.caffemodel了近似于原始网络的乘法和权重.

很好的解决方案! (2认同)

归档时间：	9 年前
查看次数：	1638 次
最近记录：	6 年，4 月前