标签: thrust

快速CUDA推力定制比较运算符

我正在评估CUDA并且目前使用Thrust库对数字进行排序.

我想为thrust :: sort创建我自己的比较器,但它会大幅减速!我只是从functional.h复制代码,创建了自己较少的实现.然而,它似乎以其他方式编译并且工作非常缓慢.

默认比较器:thrust :: less() - 94 ms
我自己的比较器:less() - 906 ms

我正在使用Visual Studio 2010.我应该怎样做才能获得与选项1相同的性能？

完整代码:

#include <stdio.h>

#include <cuda.h>

#include <thrust/host_vector.h>
#include <thrust/device_vector.h>
#include <thrust/generate.h>
#include <thrust/sort.h>

int myRand()
{
        static int counter = 0;
        if ( counter++ % 10000 == 0 )
                srand(time(NULL)+counter);
        return (rand()<<16) | rand();
}

template<typename T>
struct less : public thrust::binary_function<T,T,bool>
{
  __host__ __device__ bool operator()(const T &lhs, const T &rhs) const {
     return lhs < rhs;
  } …

Run Code Online (Sandbox Code Playgroud)

cuda thrust

Ant*_*sev

2012 01-28

4
推荐指数

1
解决办法

2345
查看次数

如何在不隐式调用'copy'的情况下初始化CUDA Thrust向量？

我有一个指针int *h_a,它引用了N我要复制到设备的大量数据点(在主机上).所以我这样做:

thrust::host_vector<int> ht_a(h_a, h_a + N);
thrust::device_vector<int> dt_a = ht_a;

Run Code Online (Sandbox Code Playgroud)

然而,创建ht_a似乎隐含地复制 h_a而不是引用它,这是低效的,因为我不需要另一个副本h_a.

我只是想创造ht_a这样的&ht_a[0]指向h_a[0]- 如何做到这一点？

非常感谢.

或者,因为ht_a除了复制到设备内存之外,我实际上并没有做任何事情,我有兴趣知道我们是否可以直接在int*和之间进行操作thrust::device_vector<int>.

cuda gpgpu thrust

mch*_*hen

2013 01-15

4
推荐指数

1
解决办法

2972
查看次数

是否可以在 CUDA 中使用推力库将推力::device_vector 和推力::fill 用于 2D 数组

我是新使用推力库。我有我的 CUDA C 代码，它使用全局 2D 数组。我在代码中使用内核函数初始化它。

我必须知道是否可以使用thrust::device_vector或thrust::fill初始化和填充二维数组。

例如：

// initialize 1D array with ten numbers in a device_vector 
    thrust::device_vector<int> D(10);

Run Code Online (Sandbox Code Playgroud)

可以给吗..

thrust::device_vector<int> D[5][10];

Run Code Online (Sandbox Code Playgroud)

如果可能的话我将如何使用thrust::fill函数。

我的目标是使用推力库优化代码。

c optimization cuda thrust

use*_*682

2014 03-28

4
推荐指数

1
解决办法

5820
查看次数

通过常数乘以设备向量

我正在使用推力进行项目,它似乎缺少一些基本功能: -

在c ++中,将向量乘以常量的最简单方法是使用std::transform,std::bind1st如下所示:

std::transform(vec.begin(), vec.end(), vec.begin(),
           std::bind1st(std::multiplies<double>(),myConst));

Run Code Online (Sandbox Code Playgroud)

但显然bind1st并bind2nd不能与推力工作.

那么,是否有一种简单的方法可以将矢量乘以推力常数？

PS:目前我正在使用我自己的仿函数来进行乘法运算:

thrust::for_each(vec.begin(), vec.end(), multiplyByConstant<double>(myConst))

Run Code Online (Sandbox Code Playgroud)

哪里

    template< typename T >
    struct multiplyByConstant
    {
    const T constant;

    multiplyByConstant(T _constant) : constant(_constant) {}

     __host__ __device__
     void operator()( T& VecElem) const 
      {
        VecElem=VecElem*constant;
      }
    };

Run Code Online (Sandbox Code Playgroud)

但是编写一个仿函数进行简单的乘法似乎有些过分.肯定必须有一个更简单的方法.

c++ cuda vector thrust

Sha*_* RC

2014 04-07

4
推荐指数

2
解决办法

2059
查看次数

通过CUDA Thrust对具有偶数或奇数索引的元素求和

如果我使用

 float sum = thrust::transform_reduce(d_a.begin(), d_a.end(), conditional_operator(), 0.f, thrust::plus<float>());

Run Code Online (Sandbox Code Playgroud)

我得到满足条件的所有元素的总和conditional_operator(),如在CUDA中的条件减少.

但我可以总结只有元素d_a[0],d_a[2],d_a[4],d_a[6],.....？

我想过改变条件运算符,但它可以处理数组中的元素而不需要引用索引.

我能做些什么？

cuda sum thrust

Ros*_*han

2017 05-23

4
推荐指数

1
解决办法

590
查看次数

将 Thrust 设备迭代器转换为原始指针

我正在考虑以下简单代码，其中我将thrust::host_vector<int>::iterator h_temp_iterator = h_temp.begin();和转换thrust::device_vector<int>::iterator d_temp_iterator = d_temp.begin();为原始指针。

为此，我分别将&(h_temp_iterator[0])和传递&(d_temp_iterator[0])给函数和内核。前者（CPU 情况）可以编译，后者（GPU 情况）则不能。这两种情况原则上应该是对称的，所以我不明白错误消息的原因是：

Error   1   error : no suitable conversion function from "thrust::device_ptr<int>" to "int *" exists

Run Code Online (Sandbox Code Playgroud)

配置是：

Windows 7, Visual Studio 2010, CUDA 7.5, 编译架构3.5。
Windows 10, Visual Studio 2013, CUDA 8.0, 编译架构5.2。

代码

#include <thrust\host_vector.h>
#include <thrust\device_vector.h>

__global__ void testKernel(int *a, const int N)
{
    int i = threadIdx.x;

    if (i >= N) …

Run Code Online (Sandbox Code Playgroud)

cuda thrust

Jac*_*ern

2017 07-27

4
推荐指数

1
解决办法

1759
查看次数

在推力:: device_vector(CUDA推力)上碰撞推力:: min_element

以下CUDA Thrust程序崩溃:

#include <thrust/device_vector.h>
#include <thrust/extrema.h>

int main(void)
{
  thrust::device_vector<int> vec;
  for (int i(0); i < 1000; ++i) {
    vec.push_back(i);
  }

  thrust::min_element(vec.begin(), vec.end());
}

Run Code Online (Sandbox Code Playgroud)

我得到的例外是:

Unhandled exception at 0x7650b9bc in test_thrust.exe: Microsoft C++
exception:thrust::system::system_error at memory location 0x0017f178..

In `checked_cudaMemcpy()` in `trivial_copy.inl`.

Run Code Online (Sandbox Code Playgroud)

如果我添加#include <thrust/sort.h>和替换min_element用sort,它不会崩溃.

我在Windows 7 64位,compute_20,sm_20(费米),调试版本上使用CUDA 4.1.在发布版本中,我没有得到崩溃,min_element找到了正确的元素.

我做错了什么,或者Thrust中有错误吗？

cuda thrust

Rog*_*ahl

lucky-day

3
推荐指数

1
解决办法

898
查看次数

如何释放device_vector <int>

我使用推力装置矢量分配了一些空间如下:

thrust::device_vector<int> s(10000000000);

Run Code Online (Sandbox Code Playgroud)

如何明确正确地释放这个空间？

cuda gpu thrust

Pro*_*mer

2012 06-20

3
推荐指数

2
解决办法

3995
查看次数

thrust :: remove_if的返回值类型

我有两个整数数组,dmap 并且dflag 在相同长度的设备上,我用推力设备指针包裹它们,dmapt并且 dflagt

dmap数组中有一些值为-1的元素.我想从dflag数组中删除这些-1和相应的值.

我正在使用remove_if函数来执行此操作,但我无法弄清楚此调用的返回值是什么,或者我应该如何使用此返回值来获取.

(我想将这些简化的数组传递给将reduce_by_keydflagt用作键的函数.)

我正在使用以下调用进行缩减.请告诉我如何将返回值存储在变量中并使用它来处理各个数组dflag和dmap

thrust::remove_if( 
    thrust::make_zip_iterator(thrust::make_tuple(dmapt, dflagt)), 
    thrust::make_zip_iterator(thrust::make_tuple(dmapt+numindices, dflagt+numindices)), 
    minus_one_equality_test() 
);

Run Code Online (Sandbox Code Playgroud)

将上面使用的谓词仿函数定义为

struct minus_one_equality_test
{ 
    typedef typename thrust::tuple<int,int> Tuple; 
    __host__ __device__ 
    bool operator()(const Tuple& a ) 
    { 
        return  thrust::get<0>(a) ==  (-1); 
    } 
}

Run Code Online (Sandbox Code Playgroud)

cuda thrust

smi*_*dha

2012 09-05

3
推荐指数

1
解决办法

1756
查看次数

在cuda中使用推力实验:: pinned_allocator的奇怪行为

我目前正试图从我的代码中删除部分繁琐的cudaMallocHost/cudaFreeHost.为此,我愿意只使用std :: vector,但我绝对需要底层内存必须是固定的cuda内存类型.

但是,我使用thrust::system::cuda::experimental::pinned_allocator<>推力库中的奇怪行为:

//STL
#include <iostream>
#include <string>
#include <vector>
#include <algorithm>

//CUDA
#include <cuda_runtime.h>
#include <thrust/device_vector.h>
#include <thrust/transform.h>
#include <thrust/system/cuda/experimental/pinned_allocator.h>

#define SIZE 4
#define INITVAL 2
#define ENDVAL 4

//Compile using nvcc ./main.cu -o test -std=c++11
int main( int argc, char* argv[] )
{
    // init host
    std::vector<float,thrust::system::cuda::experimental::pinned_allocator<float> > hostVec(SIZE);
    std::fill(hostVec.begin(),hostVec.end(),INITVAL);

    //Init device
    thrust::device_vector<float> thrustVec(hostVec.size());

    //Copy
    thrust::copy(hostVec.begin(), hostVec.end(), thrustVec.begin());

    //std::cout << "Dereferencing values of the device, values should be "<< INITVAL << std::endl;
    std::for_each(thrustVec.begin(),thrustVec.end(),[](float in){ std::cout …

Run Code Online (Sandbox Code Playgroud)

iterator cuda thrust c++11

Tob*_*bey

2016 04-13

3
推荐指数

1
解决办法

434
查看次数