pat*_*elR 2 performance machine-learning
在许多关于 kaggle 的笔记本中,我看到了用于减少数据内存使用的方法,例如将 int64 columnns 转换为 int32。如果数据可以放入内存中,为什么我们要减少内存使用量?它会让机器学习模型在数据上训练得更快吗?
Yes. In a lot of models there will be numerous computations involving feature vectors.
For example in a MLP, we would do a weighted sum of the feature representations to produce output in each neuron .
Computations are much faster if feature vector components are 32 bit in comparison with 64 bit representations.
Let me illustrate with a simple example :
import timeit
mult64 = """
import numpy as np
arr64 = np.int64([3,4,5])
arr64*arr64
"""
mult32 = """
import numpy as np
arr32 = np.int32([3,4,5])
arr32*arr32
"""
mult64_time = timeit.timeit(mult64, number=100)/100
mult32_time = timeit.timeit(mult32, number=100)/100
print(mult64_time)
print(mult32_time)
Run Code Online (Sandbox Code Playgroud)
Gives me the result as below. As can be seen, the time taken to do a simple multiplication on CPU is way faster for int32 than int64. Have generally found this transformation useful to save training time/prediction time.
0.00086965738
1.7849500000000074e-06
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
49 次 |
| 最近记录: |