为什么将数据从CPU传输到GPU而不是GPU传输到CPU更快?

avg*_*vgn 10 matlab gpu nvidia tesla

我注意到,将数据传输到最近的高端GPU比将其收集回CPU更快.以下是使用由旧版Nvidia K20和最近使用PCIE的Nvidia P100运行的mathworks技术支持提供给我的基准测试功能的结果:

Using a Tesla P100-PCIE-12GB GPU.
Achieved peak send speed of 11.042 GB/s
Achieved peak gather speed of 4.20609 GB/s

Using a Tesla K20m GPU.
Achieved peak send speed of 2.5269 GB/s
Achieved peak gather speed of 2.52399 GB/s
Run Code Online (Sandbox Code Playgroud)

我已经在下面附上了基准功能以供参考.P100不对称的原因是什么?这个系统是依赖还是近期高端GPU的标准?可以提高聚集速度吗?

gpu = gpuDevice();
fprintf('Using a %s GPU.\n', gpu.Name)
sizeOfDouble = 8; % Each double-precision number needs 8 bytes of storage
sizes = power(2, 14:28);

sendTimes = inf(size(sizes));
gatherTimes = inf(size(sizes));
for ii=1:numel(sizes)
    numElements = sizes(ii)/sizeOfDouble;
    hostData = randi([0 9], numElements, 1);
    gpuData = randi([0 9], numElements, 1, 'gpuArray');
    % Time sending to GPU
    sendFcn = @() gpuArray(hostData);
    sendTimes(ii) = gputimeit(sendFcn);
    % Time gathering back from GPU
    gatherFcn = @() gather(gpuData);
    gatherTimes(ii) = gputimeit(gatherFcn);
end
sendBandwidth = (sizes./sendTimes)/1e9;
[maxSendBandwidth,maxSendIdx] = max(sendBandwidth);
fprintf('Achieved peak send speed of %g GB/s\n',maxSendBandwidth)
gatherBandwidth = (sizes./gatherTimes)/1e9;
[maxGatherBandwidth,maxGatherIdx] = max(gatherBandwidth);
fprintf('Achieved peak gather speed of %g GB/s\n',max(gatherBandwidth))
Run Code Online (Sandbox Code Playgroud)

编辑:我们现在知道它不依赖于系统(参见注释).我仍然想知道不对称的原因或是否可以改变.

Dev*_*-iL 4

这是一个 CW,适合任何有兴趣发布其机器基准测试的人。我们鼓励贡献者留下他们的详细信息,以防将来出现有关其结果的问题。



系统:Win10、32GB DDR4-2400Mhz 内存、i7 6700K。MATLAB:R2018a。

Using a GeForce GTX 660 GPU.
Achieved peak send speed of 7.04747 GB/s
Achieved peak gather speed of 3.11048 GB/s
Run Code Online (Sandbox Code Playgroud)

Warning: The measured time for F may be inaccurate because it is running too fast. Try measuring something that takes
longer. 
Run Code Online (Sandbox Code Playgroud)

贡献者:Dev-iL



系统:Win7,32GB RAM,i7 4790K。MATLAB:R2018a。

Warning: The measured time for F may be inaccurate because it is running too fast. Try measuring something that takes
longer. 
Run Code Online (Sandbox Code Playgroud)

贡献者:Dev-iL