使用CUDA,是否可以像垃圾收集一样使用?
例如,当我收到内存不足错误时cudaMalloc(...),是否可以释放先前分配的数据并重试分配内存?
一旦cudaMalloc(...)返回内存不足,以下cuda调用似乎在此之后返回内存不足.即使我用之前分配的有效设备指针调用cudaFree,cudaFree也会返回内存不足...
cudaDeviceReset() 对我的案子来说,恢复状态不是一个好方法.
我有一个以下类型的结构
typedef struct Edge
{
int first;
int second;
}Edge;
Run Code Online (Sandbox Code Playgroud)
我在我的main函数中实例化并复制到数组中
Edge h_edges[NUM_EDGES];
for (int i = 0; i < NUM_VERTICES; ++i)
{
Edge* e = (Edge*)malloc(sizeof(Edge));
e->first = (rand() % (NUM_VERTICES+1));
e->second = (rand() % (NUM_VERTICES+1));
memcpy(h_edges[i], e, sizeof(e));
}
Run Code Online (Sandbox Code Playgroud)
我一直遇到以下错误.
src/main.cu(28): error: no suitable conversion function from "Edge" to "void *" exists
Run Code Online (Sandbox Code Playgroud)
第28行是memcpy发生的行.任何帮助赞赏.
输入:
BC
BD
BC
BC
BD
CD
输出:
BC 3
BD 2
CD 1
如果我使用 char 类型作为键,它是可用的。但似乎 Thrust 不支持字符串作为键。
#include <thrust/device_vector.h>
#include <thrust/iterator/constant_iterator.h>
#include <thrust/reduce.h>
#include <string>
int main(void)
{
std::string data = "aaabbbbbcddeeeeeeeeeff";
size_t N = data.size();
thrust::device_vector<char> input(data.begin(), data.end());
thrust::device_vector<char> output(N);
thrust::device_vector<int> lengths(N);
size_t num_runs =
thrust::reduce_by_key(input.begin(), input.end(),
thrust::constant_iterator<int>(1),
output.begin(),
lengths.begin()
).first - output.begin();
return 0;
}
Run Code Online (Sandbox Code Playgroud)
如何使用 Thrust 实现它?
我正在安装Caffe.我正在使用Ubuntu 14.04.
我试着安装cuda.在Caffe网站上写道,我需要单独安装库和最新的独立驱动程序.
我从那里下载了驱动程序.我尝试了每种产品类型,但我得到了同样的错误:
You do not appear to have an NVIDIA GPU supported by the 346.46
NVIDIA Linux graphics driver installed in this system. For further
details, please see the appendix SUPPORTED NVIDIA GRAPHICS CHIPS in
the README available on the Linux driver download page at www.nvidia.com.
Run Code Online (Sandbox Code Playgroud)
然后
You appear to be running an X server; please exit X before
installing. For further details, please see the section INSTALLING
THE NVIDIA DRIVER in the README available on …Run Code Online (Sandbox Code Playgroud) 我正在 CUDA 中实现并行缩减。
内核等待__syncthreads所有线程完成对共享内存的 2 次读取,然后将总和写回共享内存。
我应该使用 a__threadfence_block来确保下一次迭代的所有线程都可以看到对共享内存的写入,还是__syncthreads按照NVIDIA 示例中给出的方式使用?
我正在尝试使用CUDA支持安装Tensorflow.这是我的规格:
我已经通过pip安装安装了Tensorflow - 所以我想象你的答案是从源代码安装,但我想确保没有快速修复.
错误是:
volcart@volcart-Precision-Tower-7910:~$ python
Python 2.7.10 (default, Oct 14 2015, 16:09:02)
[GCC 5.2.1 20151010] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library …Run Code Online (Sandbox Code Playgroud) 我已经安装了 CUDA 7.5 和 cuDNN 5.0 的 tensorflow。我的显卡是 NVIDIA Geforce 820M,功能为 2.1。但是,我收到此错误。
Ignoring visible gpu device (device: 0, name: GeForce 820M, pci bus id: 0000:08:00.0) with Cuda compute capability 2.1. The minimum required Cuda capability is 3.0.
Device mapping: no known devices.
Run Code Online (Sandbox Code Playgroud)
有没有办法在 2.1 功能上运行 GPU?我在网上搜了一下,发现是cuDNN需要这个能力,那么安装较早版本的cuDNN是否可以让我使用GPU?
我正在使用https://github.com/chiphuyen/stanford-tensorflow-tutorials/blob/master/examples/04_word2vec_no_frills.py中的代码试验word2vec
但是,它很容易耗尽我所有的GPU内存,不知道为什么?
with tf.name_scope('data'):
center_words = tf.placeholder(tf.int32, shape=[BATCH_SIZE], name='center_words')
target_words = tf.placeholder(tf.int32, shape=[BATCH_SIZE, 1], name='target_words')
with tf.name_scope("embedding_matrix"):
embed_matrix = tf.Variable(tf.random_uniform([VOCAB_SIZE, EMBED_SIZE], -1.0, 1.0), name="embed_matrix")
with tf.name_scope("loss"):
embed = tf.nn.embedding_lookup(embed_matrix, center_words, name="embed")
nce_weight = tf.Variable(tf.truncated_normal([VOCAB_SIZE, EMBED_SIZE], stddev=1.0/(EMBED_SIZE ** 0.5)), name="nce_weight")
nce_bias = tf.Variable(tf.zeros([VOCAB_SIZE]), name="nce_bias")
loss = tf.reduce_mean(tf.nn.nce_loss(weights=nce_weight, biases=nce_bias, labels=target_words, inputs=embed, num_sampled=NUM_SAMPLED, num_classes=VOCAB_SIZE), name="loss")
optimizer = tf.train.GradientDescentOptimizer(LEARNING_RATE).minimize(loss)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
total_loss = 0.0 # we use this to calculate the average loss in the last SKIP_STEP steps
writer …Run Code Online (Sandbox Code Playgroud) 我运行了这个我在CUDA Python简介页面上阅读的代码: -
import numpy as np
from timeit import default_timer as timer
from numbapro import vectorize
@vectorize(["float32(float32, float32)"], target='gpu')
def VectorAdd(a, b):
return a + b
def main():
N = 32000000
A = np.ones(N, dtype=np.float32)
B = np.ones(N, dtype=np.float32)
C = np.zeros(N, dtype=np.float32)
start = timer()
C = VectorAdd(A, B)
vectoradd_timer = timer() - start
print("C[:5] = " + str(C[:5]))
print("C[-5:] = " + str(C[-5:]))
print("VectorAdd took %f seconds" % vectoradd_timer)
if __name__ == '__main__':
main()
Run Code Online (Sandbox Code Playgroud)
我在终端上收到以下错误: -
dtn34@dtn34-ubuntu:~/Python$ …Run Code Online (Sandbox Code Playgroud) 我手动安装了CUDA v9.2和相应的cuDNN以安装tensorflow gpu,但我意识到tensorflow 1.8.0需要CUDA 9.0,所以我运行了
pip install tensorflow-gpu
Run Code Online (Sandbox Code Playgroud)
从anaconda提示(基本环境)中自动安装CUDA 9.0和相应的cuDNN。我从同一命令提示符启动了Spyder。所以这是我在Python 3.6中的代码,其中我正在使用keras和tensorflow来训练8000个奇数图像-
# Convolutional Neural Networks
# Part 1 - Building the CNN
# Not important
# Part 2- Fitting the CNN to the images -
from keras.preprocessing.image import ImageDataGenerator
train_datagen = ImageDataGenerator(
rescale=1./255,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True)
test_datagen = ImageDataGenerator(rescale=1./255)
training_set = train_datagen.flow_from_directory(
'dataset/training_set',
target_size=(64, 64),
batch_size=32,
class_mode='binary')
test_set = test_datagen.flow_from_directory(
'dataset/test_set',
target_size=(64, 64),
batch_size=32,
class_mode='binary')
with tf.device("/gpu:0"): # Notice THIS
classifier.fit_generator(
training_set,
steps_per_epoch=8000,
epochs=25,
validation_data=test_set,
validation_steps=2000)
Run Code Online (Sandbox Code Playgroud)
注意,在最后拟合数据集之前,我将其放入
with tf.device("/gpu:0"):
Run Code Online (Sandbox Code Playgroud)
我认为这应该确保它使用GPU进行训练?我不确定,因为将“ …