CL_INVALID_WORK_GROUP_SIZE的原因

Fra*_*ter 32 opencl

当我从改变工作组大小1632或更大的东西,我得到一个CL_INVALID_WORK_GROUP_SIZE错误.matrix_size64.

  localWorkSize[0] = groupsize;
  localWorkSize[1] = localWorkSize[0];
  globalWorkSize[0] = matrix_size;
  globalWorkSize[1] = globalWorkSize[0];
Run Code Online (Sandbox Code Playgroud)

首先,我检查了clEnqueueNDRangeKernel的文档,其中列出了四(5)种不同的原因CL_INVALID_WORK_GROUP_SIZE,但我认为它们不适用.请检查我的结论.(我希望你不介意我的QA风格)


Q CL_INVALID_WORK_GROUP_SIZE if local_work_size is specified and number of work-items specified by global_work_size is not evenly divisable by size of work-group given by local_work_size

64%32 = 0

Q or does not match the work-group size specified for kernel using the __attribute__((reqd_work_group_size(X, Y, Z))) qualifier in program source.

答: 据我了解帮助,我没有使用__attribute__.

Q CL_INVALID_WORK_GROUP_SIZE if local_work_size is specified and the total number of work-items in the work-group computed as local_work_size[0] *... local_work_size[work_dim - 1] is greater than the value specified by CL_DEVICE_MAX_WORK_GROUP_SIZE in the table of OpenCL Device Queries for clGetDeviceInfo.

一个 我问clGetDeviceInfoCL_DEVICE_MAX_WORK_GROUP_SIZE512, 512, 64

Q CL_INVALID_WORK_GROUP_SIZE if local_work_size is NULL and the __attribute__((reqd_work_group_size(X, Y, Z))) qualifier is used to declare the work-group size for kernel in the program source.

A local_work_size不是NULL.

Q CL_INVALID_WORK_ITEM_SIZE if the number of work-items specified in any of local_work_size[0], ... local_work_size[work_dim - 1] is greater than the corresponding values specified by CL_DEVICE_MAX_WORK_ITEM_SIZES[0], .... CL_DEVICE_MAX_WORK_ITEM_SIZES[work_dim - 1].

A 32 <512


我希望,我没有忽视某些事情.当你知道什么可能导致CL_INVALID_WORK_GROUP_SIZE我的结论或发现错误时,请告诉我.

感谢您花时间阅读所有这些:)

Qua*_*dom 19

CL_DEVICE_MAX_WORK_GROUP_SIZE应该返回一个size_t值(例如512,但我不知道它在你的系统上是什么).这是工作组中的最大工作项数,而不是每个维中的最大工作项数.因此,在您的情况下,您正在尝试制作一个具有32*32 = 1024个工作项的2D工作组,并且可能CL_DEVICE_MAX_WORK_GROUP_SIZE在您的系统上小于1024.

参见OpenCL 1.1规范,表4.3,第37页,定义CL_DEVICE_MAX_WORK_GROUP_SIZE:

使用数据并行执行模型执行内核的工作组中的最大工作项数.