小编tal*_*ies的帖子

一旦cudaMalloc返回内存不足,每个cuda API调用都会返回失败

使用CUDA,是否可以像垃圾收集一样使用？

例如,当我收到内存不足错误时cudaMalloc(...),是否可以释放先前分配的数据并重试分配内存？

一旦cudaMalloc(...)返回内存不足,以下cuda调用似乎在此之后返回内存不足.即使我用之前分配的有效设备指针调用cudaFree,cudaFree也会返回内存不足...

cudaDeviceReset() 对我的案子来说,恢复状态不是一个好方法.

c++ memory cuda

dex*_*tol

2015 06-18

0
推荐指数

1
解决办法

982
查看次数

使用C++将结构复制到数组中

我有一个以下类型的结构

typedef struct Edge
{
    int first;
    int second;

}Edge;

Run Code Online (Sandbox Code Playgroud)

我在我的main函数中实例化并复制到数组中

Edge h_edges[NUM_EDGES];
for (int i = 0; i < NUM_VERTICES; ++i)
    {
        Edge* e = (Edge*)malloc(sizeof(Edge));
        e->first = (rand() % (NUM_VERTICES+1));
        e->second = (rand() % (NUM_VERTICES+1));
        memcpy(h_edges[i], e, sizeof(e));
    }

Run Code Online (Sandbox Code Playgroud)

我一直遇到以下错误.

src/main.cu(28): error: no suitable conversion function from "Edge" to "void *" exists

Run Code Online (Sandbox Code Playgroud)

第28行是memcpy发生的行.任何帮助赞赏.

c++ arrays struct

Ant*_*sis

2015 12-05

0
推荐指数

1
解决办法

168
查看次数

键为字符串或字符数组时如何使用Thrust实现按键reduce

输入：

BC
BD
BC
BC
BD
CD

输出：

BC 3
BD 2
CD 1

如果我使用 char 类型作为键，它是可用的。但似乎 Thrust 不支持字符串作为键。

#include <thrust/device_vector.h>
#include <thrust/iterator/constant_iterator.h>
#include <thrust/reduce.h>
#include <string>

int main(void)
{
  std::string data = "aaabbbbbcddeeeeeeeeeff";

  size_t N = data.size();

  thrust::device_vector<char> input(data.begin(), data.end());

  thrust::device_vector<char> output(N);
  thrust::device_vector<int>  lengths(N);

  size_t num_runs =
    thrust::reduce_by_key(input.begin(), input.end(),        
                      thrust::constant_iterator<int>(1), 
                      output.begin(),                    
                      lengths.begin()                    
                      ).first - output.begin();
   return 0;
}

Run Code Online (Sandbox Code Playgroud)

如何使用 Thrust 实现它？

cuda thrust

fan*_*nhk

2015 12-17

0
推荐指数

1
解决办法

1829
查看次数

Caffe安装

我正在安装Caffe.我正在使用Ubuntu 14.04.

我试着安装cuda.在Caffe网站上写道,我需要单独安装库和最新的独立驱动程序.

我从那里下载了驱动程序.我尝试了每种产品类型,但我得到了同样的错误:

You do not appear to have an NVIDIA GPU supported by the 346.46 
NVIDIA Linux graphics driver installed in this system. For further 
details, please see the appendix SUPPORTED NVIDIA GRAPHICS CHIPS in 
the README available on the Linux driver download page at www.nvidia.com.

Run Code Online (Sandbox Code Playgroud)

然后

You appear to be running an X server; please exit X before            
installing.  For further details, please see the section INSTALLING   
THE NVIDIA DRIVER in the README available on …

Run Code Online (Sandbox Code Playgroud)

cuda nvidia caffe

Ale*_*dra

2016 04-19

0
推荐指数

1
解决办法

2020
查看次数

threadfence 暗示了 syncthreads 的效果吗？

我正在 CUDA 中实现并行缩减。

内核等待__syncthreads所有线程完成对共享内存的 2 次读取，然后将总和写回共享内存。

我应该使用 a__threadfence_block来确保下一次迭代的所有线程都可以看到对共享内存的写入，还是__syncthreads按照NVIDIA 示例中给出的方式使用？

parallel-processing synchronization cuda reduction

kes*_*ari

2016 06-08

0
推荐指数

1
解决办法

671
查看次数

Tensorflow CUDA GTX 1070导入错误

我正在尝试使用CUDA支持安装Tensorflow.这是我的规格:

NVIDIA GTX 1070
CUDA 7.5
Cudnn v5.0

我已经通过pip安装安装了Tensorflow - 所以我想象你的答案是从源代码安装,但我想确保没有快速修复.

错误是:

volcart@volcart-Precision-Tower-7910:~$ python
Python 2.7.10 (default, Oct 14 2015, 16:09:02) 
[GCC 5.2.1 20151010] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library …

Run Code Online (Sandbox Code Playgroud)

nvidia machine-learning neural-network tensorflow

Ken*_*ihe

2016 08-05

0
推荐指数

1
解决办法

1131
查看次数

不能在 Tensorflow 中使用 GPU

我已经安装了 CUDA 7.5 和 cuDNN 5.0 的 tensorflow。我的显卡是 NVIDIA Geforce 820M，功能为 2.1。但是，我收到此错误。

Ignoring visible gpu device (device: 0, name: GeForce 820M, pci bus id: 0000:08:00.0) with Cuda compute capability 2.1. The minimum required Cuda capability is 3.0.
Device mapping: no known devices.

Run Code Online (Sandbox Code Playgroud)

有没有办法在 2.1 功能上运行 GPU？我在网上搜了一下，发现是cuDNN需要这个能力，那么安装较早版本的cuDNN是否可以让我使用GPU？

gpu compute-capability tensorflow

Kee*_*nan

2020 12-31

0
推荐指数

1
解决办法

3717
查看次数

使用以下tensorflow代码占用所有GPU内存？

我正在使用https://github.com/chiphuyen/stanford-tensorflow-tutorials/blob/master/examples/04_word2vec_no_frills.py中的代码试验word2vec

但是,它很容易耗尽我所有的GPU内存,不知道为什么？

with tf.name_scope('data'):
    center_words = tf.placeholder(tf.int32, shape=[BATCH_SIZE], name='center_words')
    target_words = tf.placeholder(tf.int32, shape=[BATCH_SIZE, 1], name='target_words')

with tf.name_scope("embedding_matrix"):
    embed_matrix = tf.Variable(tf.random_uniform([VOCAB_SIZE, EMBED_SIZE], -1.0, 1.0), name="embed_matrix")

with tf.name_scope("loss"):
    embed = tf.nn.embedding_lookup(embed_matrix, center_words, name="embed")

    nce_weight = tf.Variable(tf.truncated_normal([VOCAB_SIZE, EMBED_SIZE], stddev=1.0/(EMBED_SIZE ** 0.5)), name="nce_weight")
    nce_bias = tf.Variable(tf.zeros([VOCAB_SIZE]), name="nce_bias")


    loss = tf.reduce_mean(tf.nn.nce_loss(weights=nce_weight, biases=nce_bias, labels=target_words, inputs=embed, num_sampled=NUM_SAMPLED, num_classes=VOCAB_SIZE), name="loss")


optimizer = tf.train.GradientDescentOptimizer(LEARNING_RATE).minimize(loss)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())

    total_loss = 0.0 # we use this to calculate the average loss in the last SKIP_STEP steps
    writer …

Run Code Online (Sandbox Code Playgroud)

memory gpu tensorflow

ZEW*_*CHU

2017 09-07

0
推荐指数

1
解决办法

719
查看次数

没有名为numbapro的模块

我运行了这个我在CUDA Python简介页面上阅读的代码: -

import numpy as np
from timeit import default_timer as timer
from numbapro import vectorize

@vectorize(["float32(float32, float32)"], target='gpu')
def VectorAdd(a, b):
        return a + b

def main():
    N = 32000000

    A = np.ones(N, dtype=np.float32)
    B = np.ones(N, dtype=np.float32)
    C = np.zeros(N, dtype=np.float32)

    start = timer()
    C = VectorAdd(A, B)
    vectoradd_timer = timer() - start

    print("C[:5] = " + str(C[:5]))
    print("C[-5:] = " + str(C[-5:]))

    print("VectorAdd took %f seconds" % vectoradd_timer)

if __name__ == '__main__':
    main()

Run Code Online (Sandbox Code Playgroud)

我在终端上收到以下错误: -

dtn34@dtn34-ubuntu:~/Python$ …

Run Code Online (Sandbox Code Playgroud)

python numba-pro

dtn*_*34-

2018 01-22

0
推荐指数

1
解决办法

2202
查看次数

如何确保张量流正在使用GPU

我手动安装了CUDA v9.2和相应的cuDNN以安装tensorflow gpu，但我意识到tensorflow 1.8.0需要CUDA 9.0，所以我运行了

pip install tensorflow-gpu

Run Code Online (Sandbox Code Playgroud)

从anaconda提示（基本环境）中自动安装CUDA 9.0和相应的cuDNN。我从同一命令提示符启动了Spyder。所以这是我在Python 3.6中的代码，其中我正在使用keras和tensorflow来训练8000个奇数图像-

# Convolutional Neural Networks
# Part 1 - Building the CNN
# Not important

# Part 2- Fitting the CNN to the images - 
from keras.preprocessing.image import ImageDataGenerator
train_datagen = ImageDataGenerator(
        rescale=1./255,
        shear_range=0.2,
        zoom_range=0.2,
        horizontal_flip=True)

test_datagen = ImageDataGenerator(rescale=1./255)

training_set = train_datagen.flow_from_directory(
        'dataset/training_set',
        target_size=(64, 64),
        batch_size=32,
        class_mode='binary')

test_set = test_datagen.flow_from_directory(
        'dataset/test_set',
        target_size=(64, 64),
        batch_size=32,
        class_mode='binary')
with tf.device("/gpu:0"):   # Notice THIS
    classifier.fit_generator(
            training_set,
            steps_per_epoch=8000,
            epochs=25,
            validation_data=test_set,
            validation_steps=2000)

Run Code Online (Sandbox Code Playgroud)

注意，在最后拟合数据集之前，我将其放入