更少的数据位表示会改善机器学习模型的训练时间吗?

pat*_*elR 2 performance machine-learning

在许多关于 kaggle 的笔记本中,我看到了用于减少数据内存使用的方法,例如将 int64 columnns 转换为 int32。如果数据可以放入内存中,为什么我们要减少内存使用量?它会让机器学习模型在数据上训练得更快吗?

Dee*_*pak 5

Yes. In a lot of models there will be numerous computations involving feature vectors.
For example in a MLP, we would do a weighted sum of the feature representations to produce output in each neuron .

Computations are much faster if feature vector components are 32 bit in comparison with 64 bit representations.

Let me illustrate with a simple example :

import timeit

mult64 = """
import numpy as np
arr64 = np.int64([3,4,5])
arr64*arr64
"""

mult32 = """
import numpy as np
arr32 = np.int32([3,4,5])
arr32*arr32
"""

mult64_time = timeit.timeit(mult64, number=100)/100
mult32_time = timeit.timeit(mult32, number=100)/100

print(mult64_time)
print(mult32_time)
Run Code Online (Sandbox Code Playgroud)

Gives me the result as below. As can be seen, the time taken to do a simple multiplication on CPU is way faster for int32 than int64. Have generally found this transformation useful to save training time/prediction time.

0.00086965738
1.7849500000000074e-06
Run Code Online (Sandbox Code Playgroud)