blu*_*sky 6 python math quantization tensorflow pytorch
论文“使用小型前馈网络进行自然语言处理” https://arxiv.org/pdf/1708.00214.pdf指出:
我已经按照 python 中的上述方程实现了量化:
b = 128
embedding_matrix = [[20000,3000,1000],[1999999,20000,1999999], [20000,3000,1000]]
scaled = [ abs(round( (1 / (b - 1) * max(e)) , 3)) for e in embedding_matrix]
print(scaled)
i = 0
quantized = []
for e in embedding_matrix :
for v in e :
quantized.append((v , math.floor(.5 + ( (v / scaled[i]) + b) )))
i = i + 1
quantized
Run Code Online (Sandbox Code Playgroud)
运行此代码quantized设置为:
[(20000, 255),
(3000, 147),
(1000, 134),
(1999999, 255),
(20000, 129),
(1999999, 255),
(20000, 255),
(3000, 147),
(1000, 134)]
Run Code Online (Sandbox Code Playgroud)
如何反量化回到量化之前的原始值?
阅读https://www.tensorflow.org/api_docs/python/tf/quantization/dequantize描述:
tf.quantization.dequantize(
input, min_range, max_range, mode='MIN_COMBINED', name=None, axis=None,
narrow_range=False, dtype=tf.dtypes.float32
)
[min_range, max_range] are scalar floats that specify the range for the output. The 'mode' attribute controls exactly which calculations are used to convert the float values to their quantized equivalents.
Run Code Online (Sandbox Code Playgroud)
和 PyTorch 文档:https://pytorch.org/docs/stable/quantization.html
似乎量化的实现方式与上述实现方式不同?
他们在论文中所做的事情大致是这样的:
import numpy as np
b = 128
embedding_matrix = np.array([[20000,3000,1000,1000],[1999999,20000,1999999,1999999], [20000,3000,1000,1000]])
scales = (np.abs(embedding_matrix).max(axis=1) / (b-1)).reshape(-1, 1)
quantized = (embedding_matrix / scales + b + 0.5).astype(np.uint8)
dequantized = (quantized - b) * scales
print(quantized)
print(dequantized)
Run Code Online (Sandbox Code Playgroud)
输出:
[[255 147 134 134]
[255 129 255 255]
[255 147 134 134]]
[[2.00000000e+04 2.99212598e+03 9.44881890e+02 9.44881890e+02]
[1.99999900e+06 1.57480236e+04 1.99999900e+06 1.99999900e+06]
[2.00000000e+04 2.99212598e+03 9.44881890e+02 9.44881890e+02]]
Run Code Online (Sandbox Code Playgroud)
简而言之,他们只是有q_ij = round(e_ij / s_i + b),所以在你刚刚量化了值之后,q_ij你最好的近似就是这么说q_ij = dequantized_ij / s_i + b,所以dequantized_ij = (q_ij - b) * s_i
至于 pytorch - 可以使用类似的功能,例如torch.quantize_per_channel以下代码的作用几乎相同:
import torch
t = torch.tensor(embedding_matrix, dtype=torch.float32)
zero_point = torch.tensor([b]).repeat(t.shape[0], 1).reshape(-1)
quantized_tensor = torch.quantize_per_channel(t, t.abs().max(axis=1)[0] / (b-1), zero_point, 0, torch.quint8)
print(quantized_tensor)
print(quantized_tensor.int_repr())
Run Code Online (Sandbox Code Playgroud)
输出:
tensor([[2.0000e+04, 2.9921e+03, 9.4488e+02, 9.4488e+02],
[2.0000e+06, 1.5748e+04, 2.0000e+06, 2.0000e+06],
[2.0000e+04, 2.9921e+03, 9.4488e+02, 9.4488e+02]], size=(3, 4),
dtype=torch.quint8, quantization_scheme=torch.per_channel_affine,
scale=tensor([ 157.4803, 15748.0234, 157.4803], dtype=torch.float64),
zero_point=tensor([128, 128, 128]), axis=0)
tensor([[255, 147, 134, 134],
[255, 129, 255, 255],
[255, 147, 134, 134]], dtype=torch.uint8)
Run Code Online (Sandbox Code Playgroud)
如果像这样在 pytorch 中对每个通道进行量化,您只能应用于.dequantize()完整的张量,而不是切片,这对于嵌入来说不是一件好事,但您可以使用repr_int、q_per_channel_zero_points和手动完成q_per_channel_scales。
这回答了你的问题了吗?