MPI基本数据类型对应于主机语言的数据类型,MPI_BYTE和MPI_PACKED除外.我的问题是使用这些MPI基本数据类型有什么好处?或者等效地,为什么仅使用主机语言数据类型是不好的?
我阅读了William Gropp等人的教程.在幻灯片31"为什么是数据类型"中,它说:
(http://www.mcs.anl.gov/research/projects/mpi/tutorial/mpiintro/ppframe.htm)
我没有理解这个解释.首先,如果基本数据类型不同,我不明白为什么使用MPI数据类型可以解决差异,因为基本MPI数据类型对应于主机语言的基本数据类型(基本数据类型).其次,为什么这种面向应用的内存数据布局有两个好处?
任何解答我原始问题的答案都将被接受.任何答案都可以解决我的问题,威廉格罗普的解释也将被接受.
我正在通过 Michael Hartl 的书学习 Ruby on RailsRuby on Rails\xe2\x84\xa2 Tutorial: Learn Web Development with Rails, Fourth Edition
。我在尝试时收到错误消息toy_app
。有谁知道出了什么问题吗?
Puma caught this error: Invalid option key: raise_on_unfiltered_parameters= (RuntimeError)\n.gem/ruby/gems/actionpack-5.0.0/lib/action_controller/railtie.rb:59:in `block (3 levels) in <class:Railtie>'\n.gem/ruby/gems/actionpack-5.0.0/lib/action_controller/railtie.rb:54:in `each'\n.gem/ruby/gems/actionpack-5.0.0/lib/action_controller/railtie.rb:54:in `block (2 levels) in <class:Railtie>'\n.gem/ruby/gems/activesupport-5.0.0/lib/active_support/lazy_load_hooks.rb:38:in `instance_eval'\n.gem/ruby/gems/activesupport-5.0.0/lib/active_support/lazy_load_hooks.rb:38:in `execute_hook'\n.gem/ruby/gems/activesupport-5.0.0/lib/active_support/lazy_load_hooks.rb:45:in `block in run_load_hooks'\n.gem/ruby/gems/activesupport-5.0.0/lib/active_support/lazy_load_hooks.rb:44:in `each'\n.gem/ruby/gems/activesupport-5.0.0/lib/active_support/lazy_load_hooks.rb:44:in `run_load_hooks'\n.gem/ruby/gems/actionpack-5.0.0/lib/action_controller/base.rb:263:in `<class:Base>'\n.gem/ruby/gems/actionpack-5.0.0/lib/action_controller/base.rb:164:in `<module:ActionController>'\n.gem/ruby/gems/actionpack-5.0.0/lib/action_controller/base.rb:5:in `<top (required)>'\n.gem/ruby/gems/actionpack-5.0.0/lib/action_dispatch/middleware/static.rb:77:in `ext'\n.gem/ruby/gems/actionpack-5.0.0/lib/action_dispatch/middleware/static.rb:33:in `match?'\n.gem/ruby/gems/actionpack-5.0.0/lib/action_dispatch/middleware/static.rb:130:in `call'\n.gem/ruby/gems/rack-2.0.5/lib/rack/sendfile.rb:111:in `call'\n.gem/ruby/gems/railties-5.0.0/lib/rails/engine.rb:522:in `call'\n.gem/ruby/gems/puma-3.4.0/lib/puma/configuration.rb:224:in `call'\n.gem/ruby/gems/puma-3.4.0/lib/puma/server.rb:569:in `handle_request'\n.gem/ruby/gems/puma-3.4.0/lib/puma/server.rb:406:in `process_client'\n.gem/ruby/gems/puma-3.4.0/lib/puma/server.rb:271:in `block in run'\n.gem/ruby/gems/puma-3.4.0/lib/puma/thread_pool.rb:114:in `call'\n.gem/ruby/gems/puma-3.4.0/lib/puma/thread_pool.rb:114:in `block in spawn_thread'\n
Run Code Online (Sandbox Code Playgroud)\n更新:注释掉raise_on_unfiltered_parameters
可以解决问题。
我有一个与mkl库动态链接的代码.在运行代码时,它报告找不到mkl.
./bmdl
/g/software/EMTO/5.7/intel_12.1/ser/bin/bmdl: error while loading shared libraries: libmkl_intel_lp64.so: cannot open shared object file: No such file or directory
Run Code Online (Sandbox Code Playgroud)
但是当我使用ldd检查可执行文件中的动态链接库时,它显示找到了mkl库
ldd bmdl
libmkl_intel_lp64.so => /g/software/intelXE/composer_xe_2011_sp1/mkl/lib/intel64/libmkl_intel_lp64.so (0x00002b975d76d000)
libmkl_sequential.so => /g/software/intelXE/composer_xe_2011_sp1/mkl/lib/intel64/libmkl_sequential.so (0x00002b975df53000)
libmkl_core.so => /g/software/intelXE/composer_xe_2011_sp1/mkl/lib/intel64/libmkl_core.so (0x00002b975e631000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x0000003785600000)
libm.so.6 => /lib64/libm.so.6 (0x0000003784e00000)
libc.so.6 => /lib64/libc.so.6 (0x0000003784a00000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x000000378a600000)
libdl.so.2 => /lib64/libdl.so.2 (0x0000003785200000)
/lib64/ld-linux-x86-64.so.2 (0x0000003784600000)
Run Code Online (Sandbox Code Playgroud)
知道什么可能是错的吗?
输出来自 readelf -l ./bmdl
Elf file type is EXEC (Executable file)
Entry point 0x4034b0
There are 8 program headers, starting at offset 64 …
Run Code Online (Sandbox Code Playgroud) 我正在测试简单内核的最大线程数.我发现线程总数不能超过4096.代码如下:
#include <stdio.h>
#define N 100
__global__ void test(){
printf("%d %d\n", blockIdx.x, threadIdx.x);
}
int main(void){
double *p;
size_t size=N*sizeof(double);
cudaMalloc(&p, size);
test<<<64,128>>>();
//test<<<64,128>>>();
cudaFree(p);
return 0;
}
Run Code Online (Sandbox Code Playgroud)
我的测试环境:特斯拉M2050上的CUDA 4.2.9.代码编译用
nvcc -arch=sm_20 test.cu
Run Code Online (Sandbox Code Playgroud)
在检查输出是什么时,我发现缺少一些组合.运行命令
./a.out|wc -l
Run Code Online (Sandbox Code Playgroud)
我总是得到4096.当我检查cc2.0时,我只能找到x,y,z维度的最大块数(1024,1024,512),每个块的最大线程数是1024.内核(<<<64,128>>>
或者<<<128,64>>>
)都处于极限状态.任何的想法?
注意:CUDA内存操作用于阻止代码,以便显示内核的输出.