我正在尝试安装CUDA,但是我收到一条消息"没有找到支持的visual studio版本".我认为这是因为我使用的是Visual Studio 2017(社区),而CUDA目前仅支持Visual Studio 2015.不幸的是,微软不允许我在不支付订阅费的情况下下载旧版本的Visual Studio.
有没有办法解决VS 2017的兼容性问题,还是我不能使用CUDA?
I am applying transfer-learning on a pre-trained network using the GPU version of keras. I don't understand how to define the parameters max_queue_size, workers, and use_multiprocessing. If I change these parameters (primarily to speed-up learning), I am unsure whether all data is still seen per epoch.
max_queue_size:
maximum size of the internal training queue which is used to "precache" samples from the generator
Question: Does this refer to how many batches are prepared on CPU? How …
在训练循环中,我将一批数据加载到 CPU 中,然后将其传输到 GPU:
import torch.utils as utils
train_loader = utils.data.DataLoader(train_dataset, batch_size=128, shuffle=True, num_workers=4, pin_memory=True)
for inputs, labels in train_loader:
inputs, labels = inputs.to(device), labels.to(device)
Run Code Online (Sandbox Code Playgroud)
这种加载数据的方式非常耗时。有什么方法可以直接将数据加载到 GPU 中而不需要传输步骤吗?
我有一个推力device_vector.我想将它转换为原始指针,以便我可以将它传递给内核.我怎么能这样做?
thrust::device_vector<int> dv(10);
//CAST TO RAW
kernel<<<bl,tpb>>>(pass raw)
Run Code Online (Sandbox Code Playgroud) 更新:仍然发生在Tensorflow 1.7.0中
更新:我写了一个协同合作的笔记本再现谷歌的GPU硬件这个错误:https://drive.google.com/file/d/13V87kSTyyFVMM7NoJNk9QTsCYS7FRbyz/view?usp=sharing
更新:在tf.gather对此问题的第一次修订中错误地指责之后,我现在将其缩小为tf.reduce_sum与占位符一起形成:
tf.reduce_sum 为大型张量生成零(仅在GPU上),其形状取决于占位符.
在向占位符提供大整数时运行以下代码batch_size(在我的情况下> 700000):
import tensorflow as tf
import numpy as np
graph = tf.Graph()
with graph.as_default():
batch_size = tf.placeholder(tf.int32,shape=[])
ones_with_placeholder = tf.ones([batch_size,256,4])
sum_out = tf.reduce_sum(ones_with_placeholder,axis=2)
min_sum_out = tf.reduce_min(sum_out)
sess = tf.Session(graph=graph)
sum_result,min_sum_result = sess.run([sum_out,min_sum_out],feed_dict={batch_size: 1000000})
print("Min value in sum_out processed on host with numpy:", np.min(sum_result))
print("Min value in sum_out tensor processed in graph with tf:", min_sum_result)
Run Code Online (Sandbox Code Playgroud)
显示以下错误结果:
Min value in sum_out processed on host with …Run Code Online (Sandbox Code Playgroud) 我已经训练了3个模型,现在正在运行代码,按顺序加载3个检查点中的每一个并使用它们运行预测.我正在使用GPU.
加载第一个模型时,它会预先分配整个GPU内存(我希望通过第一批数据处理).但是当它完成时它不会卸载内存.当加载第二个模型时,使用两个tf.reset_default_graph()并且with tf.Graph().as_default()GPU内存仍然完全从第一个模型消耗,然后第二个模型缺乏内存.
有没有办法解决这个问题,除了使用Python子进程或多处理来解决问题(我通过谷歌搜索找到的唯一解决方案)?
我正在使用Ubuntu 14.04 LTS运行AWS EC2 g2.2xlarge实例.我想在训练我的TensorFlow模型时观察GPU的利用率.我试图运行'nvidia-smi'时遇到错误.
ubuntu@ip-10-0-1-213:/etc/alternatives$ cd /usr/lib/nvidia-375/bin
ubuntu@ip-10-0-1-213:/usr/lib/nvidia-375/bin$ ls
nvidia-bug-report.sh nvidia-debugdump nvidia-xconfig
nvidia-cuda-mps-control nvidia-persistenced
nvidia-cuda-mps-server nvidia-smi
ubuntu@ip-10-0-1-213:/usr/lib/nvidia-375/bin$ ./nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
ubuntu@ip-10-0-1-213:/usr/lib/nvidia-375/bin$ dpkg -l | grep nvidia
ii nvidia-346 352.63-0ubuntu0.14.04.1 amd64 Transitional package for nvidia-346
ii nvidia-346-dev 346.46-0ubuntu1 amd64 NVIDIA binary Xorg driver development files
ii nvidia-346-uvm 346.96-0ubuntu0.0.1 amd64 Transitional package for nvidia-346
ii nvidia-352 375.26-0ubuntu1 amd64 Transitional package for …Run Code Online (Sandbox Code Playgroud) 我试图运行下面的代码,但报告错误:
NvvmSupportError:找不到libNVVM.执行conda install
cudatoolkit:找不到库nvvm
我的开发环境是:Ubuntu 17.04,Spyder/Python3.5,我通过conda(numba和cudatoolkit)安装.Nvidia GPU(GTX 1070和GTX 1060).
import numpy as np
from timeit import default_timer as timer
from numba import vectorize
@vectorize(["float32(float32, float32)"], target='cuda')
def VecADD(a,b):
return a+b
n = 32000000
a = np.ones (n, dtype=np.float32)
b = np.ones (n, dtype=np.float32)
c = np.zeros(n, dtype=np.float32)
start = timer()
C = VecADD(a,b)
print (timer() - start)
Run Code Online (Sandbox Code Playgroud)
有谁知道如何解决这个问题?
我可以列出gpu设备唱下面的tensorflow代码:
import tensorflow as tf
from tensorflow.python.client import device_lib
print device_lib.list_local_devices()
Run Code Online (Sandbox Code Playgroud)
结果是:
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 17897160860519880862, name: "/device:XLA_GPU:0"
device_type: "XLA_GPU"
memory_limit: 17179869184
locality {
}
incarnation: 9751861134541508701
physical_device_desc: "device: XLA_GPU device", name: "/device:XLA_CPU:0"
device_type: "XLA_CPU"
memory_limit: 17179869184
locality {
}
incarnation: 5368380567397471193
physical_device_desc: "device: XLA_CPU device", name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 21366299034
locality {
bus_id: 1
links {
link {
device_id: 1
type: "StreamExecutor"
strength: 1
}
}
}
incarnation: 7110958745101815531
physical_device_desc: "device: …Run Code Online (Sandbox Code Playgroud) gpu ×10
python ×5
tensorflow ×4
cuda ×3
dataloader ×1
gpgpu ×1
keras ×1
numba ×1
python-3.x ×1
pytorch ×1
thrust ×1
ubuntu ×1
vulkan ×1