Enr*_*rba 11 cpu performance gpu keras tensorflow
我很难理解为什么GPU和CPU速度与小尺寸网络相似(CPU有时更快),并且GPU在更大尺寸的网络上更快.问题底部的代码在i7-6700k上以103.7s运行,但是当使用tensorflow-gpu时,代码在29.5秒内运行.
但是,当我训练一个拥有100个隐藏神经元的网络时,而不是像下面的例子中的1000个,使用GPU时大约需要20秒,使用CPU时大约需要15秒.
我读到另一个堆栈溢出答案,CPU-> GPU传输需要很长时间,我假设这是参考在GPU上加载数据示例.
有人可以解释为什么会发生这种情况,并可能引用我可以为最大化速度而做出的代码中的一些变化吗?
import numpy as np
import tensorflow as tf
import keras
from keras.models import Sequential
from keras.utils import np_utils
from keras.layers.core import Dense, Activation, Flatten, Dropout
from sklearn.preprocessing import normalize
## Importing the MNIST dataset using Keras
from keras.datasets import mnist
(X_train, y_train), (X_test, y_test) = mnist.load_data()
# reshape for vector input
N, x, y = X_train.shape
X_train = normalize(np.reshape(X_train, (N, x * y)))
N, x, y = X_test.shape
X_test = normalize(np.reshape(X_test, (N, x * y)))
# one-hot encoding
y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)
model = Sequential()
model.add(Dense(output_dim=750, input_dim=784))
model.add(Activation('relu'))
model.add(Dropout(0.2))
model.add(Dense(150))
model.add(Activation('relu'))
model.add(Dropout(0.2))
model.add(Dense(50))
model.add(Activation('relu'))
model.add(Dropout(0.2))
model.add(Dense(50))
model.add(Activation('relu'))
model.add(Dropout(0.2))
model.add(Dense(10))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer='Nadam', metrics=['accuracy'])
fit = model.fit(X_train, y_train, batch_size=128, nb_epoch=10, verbose=0)
## Printing the accuracy of our model, according to the loss function specified in model.compile above
score = model.evaluate(X_test, y_test, verbose=0)
print('Test score:', score[0])
print('Test accuracy:', score[1])
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
8090 次 |
最近记录: |