查看有关CUDA问题的答案和评论,以及CUDA标记维基,我发现通常建议每个API调用的返回状态都应该检查错误.API文档包括像功能cudaGetLastError,cudaPeekAtLastError以及cudaGetErrorString,但什么是把这些结合在一起,以可靠地捕捉和无需大量额外的代码报告错误的最好方法?
我在编译CUDA SDK附带的一些示例时遇到了麻烦.我已经安装了开发人员驱动程序(版本270.41.19)和CUDA工具包,最后是SDK(两者都是4.0.17版本).
最初它根本没有编译:
error -- unsupported GNU version! gcc 4.5 and up are not supported!
Run Code Online (Sandbox Code Playgroud)
我发现81行负责:/usr/local/cuda/include/host_config.h并将其更改为:
//#if __GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ > 4)
#if __GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ > 6)
Run Code Online (Sandbox Code Playgroud)
从那时起,我只得到了一些编译的例子,它停止了:
In file included from /usr/include/c++/4.6/x86_64-linux-gnu/bits/gthr.h:162:0,
from /usr/include/c++/4.6/ext/atomicity.h:34,
from /usr/include/c++/4.6/bits/ios_base.h:41,
from /usr/include/c++/4.6/ios:43,
from /usr/include/c++/4.6/ostream:40,
from /usr/include/c++/4.6/iterator:64,
from /usr/local/cuda/include/thrust/iterator/iterator_categories.h:38,
from /usr/local/cuda/include/thrust/device_ptr.h:26,
from /usr/local/cuda/include/thrust/device_malloc_allocator.h:27,
from /usr/local/cuda/include/thrust/device_vector.h:26,
from lineOfSight.cu:37:
/usr/include/c++/4.6/x86_64-linux-gnu/bits/gthr-default.h:251:1: error: pasting "__gthrw_" and "/* Android's C library does not provide pthread_cancel, check …Run Code Online (Sandbox Code Playgroud) CUDA核心,流式多处理器和块和线程的CUDA模型之间有什么关系?
什么被映射到什么和什么是并行化以及如何?什么是更有效,最大化块数或线程数?
我目前的理解是每个多处理器有8个cuda核心.并且每个cuda核心都能够一次执行一个cuda块.并且该块中的所有线程在该特定核心中串行执行.
它是否正确?
在训练tensorflow seq2seq模型时,我看到以下消息:
W tensorflow/core/common_runtime/gpu/pool_allocator.cc:227] PoolAllocator: After 27282 get requests, put_count=9311 evicted_count=1000 eviction_rate=0.1074 and unsatisfied allocation rate=0.699032 I tensorflow/core/common_runtime/gpu/pool_allocator.cc:239] Raising pool_size_limit_ from 100 to 110 W tensorflow/core/common_runtime/gpu/pool_allocator.cc:227] PoolAllocator: After 13715 get requests, put_count=14458 evicted_count=10000 eviction_rate=0.691659 and unsatisfied allocation rate=0.675684 I tensorflow/core/common_runtime/gpu/pool_allocator.cc:239] Raising pool_size_limit_ from 110 to 121 W tensorflow/core/common_runtime/gpu/pool_allocator.cc:227] PoolAllocator: After 6965 get requests, put_count=6813 evicted_count=5000 eviction_rate=0.733891 and unsatisfied allocation rate=0.741421 I tensorflow/core/common_runtime/gpu/pool_allocator.cc:239] Raising pool_size_limit_ from 133 to 146 W tensorflow/core/common_runtime/gpu/pool_allocator.cc:227] PoolAllocator: After 44 get requests, put_count=9058 evicted_count=9000 eviction_rate=0.993597 and unsatisfied …
我知道这nvidia-smi -l 1将每秒钟提供一次GPU使用(类似于以下内容).但是,我很欣赏有关Volatile GPU-Util真正含义的解释.这是使用的SM数量超过总SM数,占用数量还是其他数量?
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.48 Driver Version: 367.48 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K20c Off | 0000:03:00.0 Off | 0 |
| 30% 41C P0 53W / 225W | 0MiB / 4742MiB | 96% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K20c Off | 0000:43:00.0 Off | 0 |
| 36% …Run Code Online (Sandbox Code Playgroud) 我知道在安装tensorflow时,您要么安装GPU版本,要么安装CPU版本.如何检查安装了哪一个(我使用的是linux).
如果安装了GPU版本,如果GPU不可用,它会自动在CPU上运行还是会抛出错误?如果GPU可用,是否需要设置特定字段或值以确保它在GPU上运行?
我一直在寻找,但运气不佳.OpenCL 是否有任何记录良好的 .NET绑定实现?(如果必须,我会为CUDA采取一些措施).我遇到了各种各样的实现,CUDA.NET,OpenCL.NET,OpenTK/Cloo(我知道,它们经常被提到stackoverflow),但它们似乎都处于alpha阶段或绝对没有可用的例子.CUDA.NET有一些帮助文件,但它只是一个库引用,它并没有真正帮助你入门.
我希望找到的是一个用于.NET编程的成熟库.最终我需要能够用F#编写代码,但是我会使用任何符合.NET的语言,因为我总是可以稍后将其转换并使用包含的任何示例来启动和运行.
自从我搜遍了以后可能是一个很长的镜头,但我希望这只是我不知道正确搜索的情况之一.
任何帮助将不胜感激.
我在python3中导入tensorflow时遇到问题:
>>> import tensorflow as tf
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 41, in <module>
from tensorflow.python.pywrap_tensorflow_internal import *
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
_pywrap_tensorflow_internal = swig_import_helper()
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
File "/usr/lib/python3.5/imp.py", line 242, in load_module
return load_dynamic(name, filename, file)
File "/usr/lib/python3.5/imp.py", line 342, in load_dynamic
return _load(spec)
ImportError: libcusolver.so.8.0: cannot open shared object file: No such file or directory
During handling of the above exception, another exception …Run Code Online (Sandbox Code Playgroud) 问题是:有没有办法在Cuda内核中使用类"向量"?当我尝试时,我收到以下错误:
error : calling a host function("std::vector<int, std::allocator<int> > ::push_back") from a __device__/__global__ function not allowed
Run Code Online (Sandbox Code Playgroud)
那么有一种方法可以在全局部分使用向量吗?我最近尝试了以下内容:
........之后我能够在我的Cuda内核中使用printf标准库函数.
有没有办法vector在内核代码中支持printf 的方式使用标准库类?这是在内核代码中使用printf的示例:
// this code only to count the 3s in an array using Cuda
//private_count is an array to hold every thread's result separately
__global__ void countKernel(int *a, int length, int* private_count)
{
printf("%d\n",threadIdx.x); //it's print the thread id and it's working
// vector<int> y;
//y.push_back(0); is there a possibility to do this? …Run Code Online (Sandbox Code Playgroud)