标签: thrust

VexCL,Thrust和Boost.Compute之间的差异

通过对这些库的粗略理解,它们看起来非常相似.我知道VexCL和Boost.Compute使用OpenCl作为后端(尽管v1.0版本VexCL也支持CUDA作为后端)并且Thrust使用CUDA.除了不同的后端,这些之间的区别是什么.

具体来说,他们解决了什么问题空间,为什么我要使用其中一个.

另外,在Thrust常见问题解答中说明了这一点

OpenCL支持的主要障碍是缺少OpenCL编译器和运行时支持C++模板

如果是这种情况,VexCL和Boost.Compute怎么可能存在.

c++ gpu-programming thrust boost-compute vexcl

Sea*_*nch

2016 09-17

41
推荐指数

1
解决办法

9759
查看次数

在用户编写的内核中推动

我是Thrust的新手.我看到所有Thrust演示文稿和示例仅显示主机代码.

我想知道我是否可以将device_vector传递给我自己的内核？怎么样？如果是,内核/设备代码中允许的操作是什么？

cuda thrust

Ash*_*ppa

2011 04-01

38
推荐指数

4
解决办法

2万
查看次数

如何将thrust :: device_vector <int>强制转换为原始指针

我有一个推力device_vector.我想将它转换为原始指针,以便我可以将它传递给内核.我怎么能这样做？

thrust::device_vector<int> dv(10);
//CAST TO RAW
kernel<<<bl,tpb>>>(pass raw)

Run Code Online (Sandbox Code Playgroud)

cuda gpu thrust

Pro*_*mer

2012 06-20

20
推荐指数

1
解决办法

1万
查看次数

从thrust :: device_vector到原始指针再回来？

我理解如何从矢量转到原始指针,但我跳过了如何倒退的节拍.

// our host vector
thrust::host_vector<dbl2> hVec;

// pretend we put data in it here

// get a device_vector
thrust::device_vector<dbl2> dVec = hVec;

// get the device ptr
thrust::device_ptr devPtr = &d_vec[0];

// now how do i get back to device_vector?
thrust::device_vector<dbl2> dVec2 = devPtr; // gives error
thrust::device_vector<dbl2> dVec2(devPtr); // gives error

Run Code Online (Sandbox Code Playgroud)

有人可以解释/指点我的例子吗？

thrust

mad*_*aze

lucky-day

19
推荐指数

2
解决办法

2万
查看次数

使用CUDA Thrust查找最大元素值及其位置

如何获得最大(最小)元素(res.val和res.pos)的值以及位置？

thrust::host_vector<float> h_vec(100);
thrust::generate(h_vec.begin(), h_vec.end(), rand);
thrust::device_vector<float> d_vec = h_vec;

T res = -1;
res = thrust::reduce(d_vec.begin(), d_vec.end(), res, thrust::maximum<T>());

Run Code Online (Sandbox Code Playgroud)

cuda thrust

ryc*_*ych

2015 02-19

15
推荐指数

2
解决办法

8227
查看次数

CUDA矢量类型的效率(float2,float3,float4)

我试图从CUDA例子中理解integrate_functorin particles_kernel.cu:

struct integrate_functor
{
    float deltaTime;    
    //constructor for functor
    //...

    template <typename Tuple>
    __device__
    void operator()(Tuple t)
    {
        volatile float4 posData = thrust::get<2>(t);
        volatile float4 velData = thrust::get<3>(t);

        float3 pos = make_float3(posData.x, posData.y, posData.z);
        float3 vel = make_float3(velData.x, velData.y, velData.z);

        // update position and velocity
        // ...

        // store new position and velocity
        thrust::get<0>(t) = make_float4(pos, posData.w);
        thrust::get<1>(t) = make_float4(vel, velData.w);
    }
};

Run Code Online (Sandbox Code Playgroud)

我们打电话make_float4(pos, age)但是make_float4被定义vector_functions.h为

static __inline__ __host__ __device__ float4 …

Run Code Online (Sandbox Code Playgroud)

c cuda thrust

ilc*_*avo

2014 11-03

14
推荐指数

1
解决办法

2万
查看次数

Thrust:如何从主机阵列创建device_vector？

我从主机上的库中获取一些数据作为指向数组的指针.如何创建在设备上保存此数据的device_vector？

int* data;
int num;
get_data_from_library( &data, &num );

thrust::device_vector< int > iVec; // How to construct this from data?

Run Code Online (Sandbox Code Playgroud)

cuda thrust

Ash*_*ppa

lucky-day

12
推荐指数

1
解决办法

6209
查看次数

可以在单个CUDA内核中启动的最大线程数

我对可以在Fermi GPU中启动的最大线程数感到困惑.

我的GTX 570设备查询说明如下.

  Maximum number of threads per block:           1024
  Maximum sizes of each dimension of a block:    1024 x 1024 x 64
  Maximum sizes of each dimension of a grid:     65535 x 65535 x 65535

Run Code Online (Sandbox Code Playgroud)

根据我的理解,我将以上陈述理解为:

对于CUDA内核,我们最多可以启动65536个块.每个启动的块最多可包含1024个线程.因此原则上,我可以启动多达65536*1024(= 67108864)个线程.

它是否正确？如果我的线程使用了很多寄存器怎么办？我们仍然能够达到理论上最大的线程数吗？

在编写并启动CUDA内核之后,我怎么知道我发起的线程和块的数量确实已经实例化了.我的意思是我不希望GPU计算一些垃圾,或者表现得非常奇怪,如果我偶然实例化了比特定内核更多的线程.

cuda gpu thrust

smi*_*dha

2016 11-17

11
推荐指数

1
解决办法

2万
查看次数

使用CUDA添加大整数

我一直在GPU上开发一种加密算法,目前坚持使用算法来执行大整数加法.大整数以通常的方式表示为一堆32位字.

例如,我们可以使用一个线程来添加两个32位字.为简单起见,假设要添加的数字具有相同的长度和每个块的线程数==字数.然后:

__global__ void add_kernel(int *C, const int *A, const int *B) {
     int x = A[threadIdx.x];
     int y = B[threadIdx.x];
     int z = x + y;
     int carry = (z < x);
     /** do carry propagation in parallel somehow ? */
     ............

     z = z + newcarry; // update the resulting words after carry propagation
     C[threadIdx.x] = z;
 }

Run Code Online (Sandbox Code Playgroud)

我很确定有一种方法可以通过一些棘手的减少程序来进行传播,但是无法弄明白.

我看了一下CUDA推力扩展但是大整数包似乎还没有实现.也许有人可以给我一个提示如何在CUDA上做到这一点？

c cuda gpgpu thrust

作者

2012 10-19

11
推荐指数

1
解决办法

6586
查看次数

解决Thrust/CUDA警告"无法分辨指针指向哪个......"

我正在尝试使用Thrust/CUDA 4.0构建一个简单的应用程序并获得大量警告"警告:无法告诉指针指向哪个,假设全局内存空间"

有没有其他人看过这个,我如何禁用它们或修复我的代码？

谢谢,

阿德

这是我的代码.

Hello.h

class DECLSPECIFIER Hello   
{ 
private:
    thrust::device_vector<unsigned long> m_device_data;

public:
    Hello(const thrust::host_vector<unsigned long>& data);
    unsigned long Sum();
    unsigned long Max();
};

Run Code Online (Sandbox Code Playgroud)

Hello.cu

#include "Hello.h"

Hello::Hello(const thrust::host_vector<unsigned long>& data)
{
    m_device_data = data;
}

unsigned long Hello::Sum()
{
    return thrust::reduce(m_device_data.cbegin(), m_device_data.cend(), 0, thrust::plus<unsigned long>());
}

unsigned long Hello::Max()
{
    return *thrust::max_element(m_device_data.cbegin(), m_device_data.cend(), thrust::less<unsigned long>());
}

Run Code Online (Sandbox Code Playgroud)

输出

1>  Compiling CUDA source file Hello.cu...
1>  
1>  C:\SrcHg\blog\HelloWorld\HelloWorldCuda>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.0\bin\nvcc.exe" -gencode=arch=compute_10,code=\"sm_10,compute_10\" --use-local-env --cl-version 2008 -ccbin "C:\Program …

Run Code Online (Sandbox Code Playgroud)

cuda visual-studio-2010 thrust

Ade*_*ler

lucky-day

10
推荐指数

1
解决办法

7442
查看次数