我知道具有2.x或更高计算能力的NVIDIA gpus可以同时执行16个内核.但是,我的应用程序产生7个"进程",这7个进程中的每个进程都会启动CUDA内核.
我的第一个问题是这些内核的预期行为是什么.它们是否会同时执行,或者由于它们由不同的进程启动,它们将按顺序执行.
我很困惑,因为CUDA C编程指南说:
"来自一个CUDA上下文的内核无法与来自另一个CUDA上下文的内核同时执行." 这让我想到了第二个问题,什么是CUDA"背景"?
谢谢!
我正在我的Mac上构建tensorflow(一个hackintosh,所以我有一个GPU,并且已经安装了CUDA8.0.它可以很好地构建caffe,所以我确信它有效.)我已经设置了如下环境变量(我把它们放进了.zshrc,.bash_profile并且.bashrc):
export CUDA_HOME=/usr/local/cuda
export DYLD_LIBRARY_PATH="$DYLD_LIBRARY_PATH:$CUDA_HOME/lib"
export PATH="$CUDA_HOME/bin:$PATH"
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:$CUDA_HOME/lib:$CUDA_HOME/extras/CUPTI/lib"
Run Code Online (Sandbox Code Playgroud)
./configure工作良好.然后我使用命令开始构建bazel build -c opt --config=cuda //tensorflow/tools/pip_package:build_pip_package.然后我收到了这个错误:
ERROR: /Development/tensorflow/tensorflow/python/BUILD:572:1: Executing genrule //tensorflow/python:array_ops_pygenrule failed: bash failed: error executing command /bin/bash -c ... (remaining 1 argument(s) skipped): com.google.devtools.build.lib.shell.AbnormalTerminationException: Process terminated by signal 5.
dyld: Library not loaded: @rpath/libcudart.8.0.dylib
Referenced from: /private/var/tmp/_bazel_zarzen/bdf1cb43f3ff02468b610730bd03f348/execroot/tensorflow/bazel-out/host/bin/tensorflow/python/gen_array_ops_py_wrappers_cc
Reason: image not found
/bin/bash: line 1: 92702 Trace/BPT trap: 5 bazel-out/host/bin/tensorflow/python/gen_array_ops_py_wrappers_cc @tensorflow/python/ops/hidden_ops.txt 1 > bazel-out/local_darwin-opt/genfiles/tensorflow/python/ops/gen_array_ops.py
Target //tensorflow/tools/pip_package:build_pip_package failed to build
Run Code Online (Sandbox Code Playgroud)
我可以确保错过的图书馆在那里.我也试过安装预先构建的二进制文件(我知道它只支持CUDA7.5,所以我设置PATH指向CUDA7.5,但它不起作用.当我尝试时import tensorflow …
所以我正在安装最新版本的OpenCV 3.2.0并且它只是停留在99%(即使我将nproc传递给-j使得它花了很长时间并且我的nproc输出为24)并且我想知道是否有解决方案因为我不想阻止它.我没有收到任何错误:
CUDA 8
Python 3.4.3
OpenCV3.2.0
Ubuntu 14.04
Run Code Online (Sandbox Code Playgroud)
和
[ 98%] Built target opencv_stitching
Scanning dependencies of target opencv_test_stitching
Scanning dependencies of target opencv_perf_stitching
BUILD SUCCESSFUL
Total time: 3 seconds
[ 99%] [ 99%] [ 99%] [ 99%] Building CXX object modules/stitching/CMakeFiles/opencv_test_stitching.dir/test/test_matchers.cpp.o
Building CXX object modules/stitching/CMakeFiles/opencv_test_stitching.dir/test/test_main.cpp.o
Building CXX object modules/stitching/CMakeFiles/opencv_test_stitching.dir/test/ocl/test_warpers.cpp.o
Building CXX object modules/stitching/CMakeFiles/opencv_test_stitching.dir/test/test_blenders.cpp.o
[ 99%] [ 99%] [ 99%] [ 99%] Building CXX object modules/stitching/CMakeFiles/opencv_perf_stitching.dir/perf/perf_estimators.cpp.o
[ 99%] Building CXX object modules/stitching/CMakeFiles/opencv_perf_stitching.dir/perf/perf_stich.cpp.o
[ 99%] Building CXX object modules/stitching/CMakeFiles/opencv_perf_stitching.dir/perf/perf_main.cpp.o
Building …Run Code Online (Sandbox Code Playgroud) 最近的TensorFlow构建似乎存在问题.当从源代码编译以与GPU一起使用时,TensorBoard可视化工具将无法运行.错误如下:
$ tensorboard
Traceback (most recent call last):
File "/home/gpu/anaconda3/envs/tensorflow/bin/tensorboard", line 7, in <module>
from tensorflow.tensorboard.tensorboard import main
ModuleNotFoundError: No module named 'tensorflow.tensorboard.tensorboard'
Run Code Online (Sandbox Code Playgroud)
系统规格:Ubuntu 16.04,NVIDIA GTX 1070,cuda-8.0,cudnn 6.0.使用Bazel从这里描述的来源安装:https: //www.tensorflow.org/install/install_sources
安装在新的anaconda3环境'tensorflow'中,执行命令时激活环境.
非常感谢任何帮助!
我已经使用以下命令执行了tensor flow安装:
pip install --ignore-installed https://github.com/mind/wheels/releases/download/tf1.5-gpu-cuda91-nomkl/tensorflow-1.5.0-cp27-cp27mu-linux_x86_64.whl
Run Code Online (Sandbox Code Playgroud)
这是为CUDA 9.1提供的最新张力轮.(比CUDA 8.0快3倍)
我可以在我的python代码中成功调用它.
如何让R中的keras调用上面python安装的tensorflow?我问的原因是因为我的默认安装方法
keras::install_keras(method="conda", tensorflow = "gpu")
Run Code Online (Sandbox Code Playgroud)
它无法识别cuda-9.1库.
> conv_base <- keras::application_vgg16(
+ weights = "imagenet",
+ include_top = FALSE,
+ input_shape = c(150, 150, 3)
+ )
/home/ubuntu/anaconda2/envs/r-tensorflow/lib/python2.7/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
from ._conv import register_converters as _register_converters
Using TensorFlow backend.
Error: ImportError: Traceback (most recent call last):
File "/home/ubuntu/anaconda2/envs/r-tensorflow/lib/python2.7/site-packages/tensorflow/python/pywrap_tensorflow.py", line …Run Code Online (Sandbox Code Playgroud) 我正在尝试在pytorch中训练LSTM层。我正在使用4个GPU。初始化时,我添加了.cuda()函数将隐藏层移动到GPU。但是,当我使用多个GPU运行代码时,出现此运行时错误:
RuntimeError: Input and hidden tensors are not at the same device
Run Code Online (Sandbox Code Playgroud)
我试图通过在以下正向函数中使用.cuda()函数来解决此问题:
self.hidden = (self.hidden[0].type(torch.FloatTensor).cuda(), self.hidden[1].type(torch.FloatTensor).cuda())
Run Code Online (Sandbox Code Playgroud)
这条线似乎可以解决问题,但令我担心的是,是否在不同的GPU中看到了更新的隐藏层。我应该将向量在前进功能的末尾移回cpu进行批处理,还是有其他解决方法?
我有 4 个 GPU(0,1,2,3),我想在 GPU 2 上运行一个 Jupyter notebook,在 GPU 0 上运行另一个。因此,在执行之后,
export CUDA_VISIBLE_DEVICES=0,1,2,3
Run Code Online (Sandbox Code Playgroud)
对于我做的 GPU 2 笔记本,
device = torch.device( f'cuda:{2}' if torch.cuda.is_available() else 'cpu')
device, torch.cuda.device_count(), torch.cuda.is_available(), torch.cuda.current_device(), torch.cuda.get_device_properties(1)
Run Code Online (Sandbox Code Playgroud)
在创建新模型或加载一个模型后,
model = nn.DataParallel( model, device_ids = [ 0, 1, 2, 3])
model = model.to( device)
Run Code Online (Sandbox Code Playgroud)
然后,当我开始训练模型时,我得到,
RuntimeError Traceback (most recent call last)
<ipython-input-18-849ffcb53e16> in <module>
46 with torch.set_grad_enabled( phase == 'train'):
47 # [N, Nclass, H, W]
---> 48 prediction = model(X)
49 # print( prediction.shape, y.shape)
50 …Run Code Online (Sandbox Code Playgroud) Could not load library cudnn_cnn_infer64_8.dll. Error code 126
Please make sure cudnn_cnn_infer64_8.dll is in your library path!
Run Code Online (Sandbox Code Playgroud)
当我尝试将 TensorFlow 与 GPU 结合使用时,我不断收到此错误,我已根据说明多次安装了 CUDA、cuDNN 和所有驱动程序。但似乎没有任何作用。如果我使用笔记本,那么 TensorFlow 使用 CPU,通过 VS code 笔记本扩展,我可以使用 GPU,但当我尝试将其作为普通 python 文件运行时,它会在第一个纪元停止会话。出现上述错误。
完整的终端输出:
Found 14630 validated image filenames belonging to 3 classes.
Found 1500 validated image filenames belonging to 3 classes.
2021-11-08 11:03:58.000354: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2
To …Run Code Online (Sandbox Code Playgroud) 随着c ++ 11的引入,简单的可复制性已经变得非常相关.最值得注意的是使用'std :: atomic'.基础很简单.如果出现以下情况,课程foo可以轻易复制:
foo* src = new foo();
foo* dest = malloc(sizeof(foo));
memcpy(dest, src, sizeof(foo));
Run Code Online (Sandbox Code Playgroud)
具有相同的效果:
foo* src = new foo();
foo* dest = new foo(src);
Run Code Online (Sandbox Code Playgroud)
因此,复制内存的对象与复制构造函数具有相同的效果.然而,当然,这是一个问题.不仅有复制构造函数.但也移动构造函数,移动赋值运算符.等等.
std :: is_trivially_copyable可用于测试对象是否可以轻易复制.因此,通过反复试验,可以使对象易于复制.
但当然,一套定义明确的规则会更好一些:).所以我的要求.
我正在研究这个模型:
class Model(torch.nn.Module):
def __init__(self, sizes, config):
super(Model, self).__init__()
self.lstm = []
for i in range(len(sizes) - 2):
self.lstm.append(LSTM(sizes[i], sizes[i+1], num_layers=8))
self.lstm.append(torch.nn.Linear(sizes[-2], sizes[-1]).cuda())
self.lstm = torch.nn.ModuleList(self.lstm)
self.config_mel = config.mel_features
def forward(self, x):
# convert to log-domain
x = x.clip(min=1e-6).log10()
for layer in self.lstm[:-1]:
x, _ = layer(x)
x = torch.relu(x)
#x = torch_unpack_seq(x)[0]
x = self.lstm[-1](x)
mask = torch.sigmoid(x)
return mask
Run Code Online (Sandbox Code Playgroud)
进而:
model = Model(model_width, config)
model.cuda()
Run Code Online (Sandbox Code Playgroud)
但我收到此错误:
File "main.py", line 29, in <module>
Model.train(args)
File ".../src/model.py", line 57, in …Run Code Online (Sandbox Code Playgroud)