标签: thrust

如何使用Thrust计算int2数组的平均值

我正在尝试计算包含点(x,y)的某个数组的平均值.
是否有可能使用推力来找到表示为(x,y)点的平均点？我也可以将数组表示为thrust::device_vector<int>每个单元格包含点的绝对位置的时间,这意味着i*numColumns + j虽然我不确定平均数字代表平均单元格.
谢谢!

cuda average thrust

iga*_*l k

2012 02-21

5
推荐指数

2
解决办法

2802
查看次数

如何将向量传递给基于推力的odeint观察器的构造函数,以便可以在仿函数中读取它

我正在使用boost的使用push的odeint扩展参数研究示例,我不知道如何将值向量传递给观察者的构造函数,以便可以从观察者的函子中访问这些值(只读) .

以下是仅供观察者使用的代码.

//// Observes the system, comparing the current state to 
//// values in unchangingVector

struct minimum_perturbation_observer { 
  struct minPerturbFunctor
  {
    template< class T >
    __host__ __device__
    void operator()( T t ) const
    {
    //// I would like to be able to read any member 
    //// of m_unchangingVector here.
    }
  };


  // CONSTRUCTOR
  minimum_perturbation_observer( size_t N, state_type unchangingVector, int len) : 
        m_N( N ),
        m_output( N ),
        m_unchangingVector( len ) // len is the correct length of unchangingVector
  { …

Run Code Online (Sandbox Code Playgroud)

c++ boost thrust odeint

wee*_*not

2014 08-30

5
推荐指数

1
解决办法

955
查看次数

快速CUDA推力定制比较运算符

我正在评估CUDA并且目前使用Thrust库对数字进行排序.

我想为thrust :: sort创建我自己的比较器,但它会大幅减速!我只是从functional.h复制代码,创建了自己较少的实现.然而,它似乎以其他方式编译并且工作非常缓慢.

默认比较器:thrust :: less() - 94 ms
我自己的比较器:less() - 906 ms

我正在使用Visual Studio 2010.我应该怎样做才能获得与选项1相同的性能？

完整代码:

#include <stdio.h>

#include <cuda.h>

#include <thrust/host_vector.h>
#include <thrust/device_vector.h>
#include <thrust/generate.h>
#include <thrust/sort.h>

int myRand()
{
        static int counter = 0;
        if ( counter++ % 10000 == 0 )
                srand(time(NULL)+counter);
        return (rand()<<16) | rand();
}

template<typename T>
struct less : public thrust::binary_function<T,T,bool>
{
  __host__ __device__ bool operator()(const T &lhs, const T &rhs) const {
     return lhs < rhs;
  } …

Run Code Online (Sandbox Code Playgroud)

cuda thrust

Ant*_*sev

2012 01-28

4
推荐指数

1
解决办法

2345
查看次数

推力表现::计数

我将以下代码作为重组数据的一部分,以便以后在CUDA内核中使用:

thrust::device_ptr<int> dev_ptr = thrust::device_pointer_cast(dev_particle_cell_indices);
int total = 0;
for(int i = 0; i < num_cells; i++) {
    particle_offsets[i] = total;
    // int num = 0;
    int num = thrust::count(dev_ptr, dev_ptr + num_particles, i);
    particle_counts[i] = num;
    total += num;
}

Run Code Online (Sandbox Code Playgroud)

现在,如果我设置num为0(取消注释第5行,并注释掉第6行),应用程序将以超过30 fps的速度运行,这是我的目标.但是,当我设置num等于thrust::count呼叫时,帧率降至约1-2 fps.为什么会这样？

我的理解是,推力应该是高度优化的算法的集合,利用GPU的强大功能,所以我很惊讶它会对我的程序的性能产生这样的影响.这是我第一次使用推力,所以我可能没有意识到一些重要的细节.

是否有关于thrust::count在循环中使用导致它运行如此缓慢的事情？如何优化我的使用？

在我目前的测试案例中,给出一些数字num_particles大约是2000,num_cells大概是1500.

cuda thrust

kev*_*sco

lucky-day

4
推荐指数

2
解决办法

2100
查看次数

是否可以在 CUDA 中使用推力库将推力::device_vector 和推力::fill 用于 2D 数组

我是新使用推力库。我有我的 CUDA C 代码，它使用全局 2D 数组。我在代码中使用内核函数初始化它。

我必须知道是否可以使用thrust::device_vector或thrust::fill初始化和填充二维数组。

例如：

// initialize 1D array with ten numbers in a device_vector 
    thrust::device_vector<int> D(10);

Run Code Online (Sandbox Code Playgroud)

可以给吗..

thrust::device_vector<int> D[5][10];

Run Code Online (Sandbox Code Playgroud)

如果可能的话我将如何使用thrust::fill函数。

我的目标是使用推力库优化代码。

c optimization cuda thrust

use*_*682

2014 03-28

4
推荐指数

1
解决办法

5820
查看次数

CUDA C诉Thrust,我错过了什么吗？

我刚开始学习CUDA编程.我正在通过一些简单的CUDA C例子,一切都在游泳.然后!突然!推力!我认为自己熟悉的C++函数子和在之间的区别感到吃惊CUDA C和Thrust

我觉得很难相信

__global__ void square(float *a, int N) {
    int idx = blockIdx.x * blockDim.x + threadIdx.x;
    if (idx < N) {
        a[idx] = a[idx] * a[idx];
    }
}

int main(int argc, char** argv) {

float *aHost, *aDevice;

const int N = 10;
size_t size = N * sizeof(float);

aHost = (float*)malloc(size);
cudaMalloc((void**)&aDevice, size);

for (int i = 0; i < N; i++) {
    aHost[i] = (float)i;
}

cudaMemcpy(aDevice, aHost, size, …

Run Code Online (Sandbox Code Playgroud)

linux cuda nvcc thrust

Tyl*_*eau

2017 09-14

4
推荐指数

1
解决办法

1165
查看次数

设备存储器上的推力减小结果

是否可以在设备分配的内存中保留推力::减少操作的返回值？如果是这样，是否就像将值分配给cudaMalloc的区域一样容易，还是我应该使用推力:: device_ptr？

reduce cuda thrust

Org*_*rim

lucky-day

4
推荐指数

2
解决办法

1265
查看次数

通过CUDA Thrust大幅减少

我有一个具有这种结构的顶点数组:

[x0, y0, z0, empty float, x1, y1, z1, empty float, x2, y2, z2, empty float, ...]

我需要找到minX,minY,minZ,maxX,maxY和maxZ使用CUDA.我写了一个适当的缩减算法,但它有点太慢了.我决定使用THRUST库.有一种高度优化的reduce(),甚至更好的minmax_element()方法,它是一种同时找到数组的最大值和最小值的方法,但我找不到一种快速的方法来使用那么只有每一个4索引.将数据复制到3分离的数组不是我正在寻找的解决方案.

有没有办法(使用Thrust迭代器或类似的东西的某种技巧)传递一个步幅reduce()？

cuda thrust

aer*_*ion

2014 07-21

4
推荐指数

1
解决办法

1283
查看次数

使用printf/cout推力

我正在尝试学习如何使用推力的CUDA,我已经看到了一些代码,其中printf函数似乎是从设备中使用的.

考虑以下代码:

#include <thrust/host_vector.h>
#include <thrust/device_vector.h>
#include <cstdio>

struct functor
{
  __host__ __device__
  void operator()(int val)
  {
      printf("Call for value : %d\n", val);
  }
};

int main()
{
    thrust::host_vector<int> cpu_vec(100);
    for(int i = 0 ; i < 100 ; ++i)
      cpu_vec[i] = i;
    thrust::device_vector<int> cuda_vec = cpu_vec; //transfer to GPU
    thrust::for_each(cuda_vec.begin(),cuda_vec.end(),functor());
}

Run Code Online (Sandbox Code Playgroud)

这似乎运行正常并打印100次消息"呼叫价值:"后跟一个数字.

现在如果我包含iostream并用基于C++流的等价物替换printf行

std::cout << "Call for value : " << val << std::endl;

Run Code Online (Sandbox Code Playgroud)

我收到来自nvcc的编译警告,编译后的程序不会打印任何内容.

warning: address of a host variable "std::cout" cannot be directly taken in a device …

Run Code Online (Sandbox Code Playgroud)

cuda thrust

bct*_*bct

2016 04-27

4
推荐指数

1
解决办法

2689
查看次数

将 Thrust 设备迭代器转换为原始指针

我正在考虑以下简单代码，其中我将thrust::host_vector<int>::iterator h_temp_iterator = h_temp.begin();和转换thrust::device_vector<int>::iterator d_temp_iterator = d_temp.begin();为原始指针。

为此，我分别将&(h_temp_iterator[0])和传递&(d_temp_iterator[0])给函数和内核。前者（CPU 情况）可以编译，后者（GPU 情况）则不能。这两种情况原则上应该是对称的，所以我不明白错误消息的原因是：

Error   1   error : no suitable conversion function from "thrust::device_ptr<int>" to "int *" exists

Run Code Online (Sandbox Code Playgroud)

配置是：

Windows 7, Visual Studio 2010, CUDA 7.5, 编译架构3.5。
Windows 10, Visual Studio 2013, CUDA 8.0, 编译架构5.2。

代码

#include <thrust\host_vector.h>
#include <thrust\device_vector.h>

__global__ void testKernel(int *a, const int N)
{
    int i = threadIdx.x;

    if (i >= N) …

Run Code Online (Sandbox Code Playgroud)

cuda thrust

Jac*_*ern

2017 07-27

4
推荐指数

1
解决办法

1759
查看次数