我有多个内核,它们以顺序方式启动,如下所示:
clEnqueueNDRangeKernel(..., kernel1, ...);
clEnqueueNDRangeKernel(..., kernel2, ...);
clEnqueueNDRangeKernel(..., kernel3, ...);
Run Code Online (Sandbox Code Playgroud)
并且,多个内核共享一个全局缓冲区.
现在,我通过在clEnqueueNDRangeKernel之后添加代码块来分析每个内核执行并总结它们以计算总执行时间:
clFinish(cmdQueue);
status = clGetEventProfilingInfo(...,&starttime,...);
clGetEventProfilingInfo(...,&endtime,...);
time_spent = endtime - starttime;
Run Code Online (Sandbox Code Playgroud)
我的问题是如何通过一个clFinish一起分析三个内核?(比如在最后一次内核启动后添加一个clFinish()).
是的,我给每个clEnqueueNDRangeKernel不同的时间事件,并得到大的负数.详细信息:
clEnqueueNDRangeKernel(cmdQueue,...,&timing_event1);
clFinish(cmdQueue);
clGetEventProfilingInfo(timing_event1,CL_PROFILING_COMMAND_START,sizeof(cl_ulong),&starttime1,NULL);
clGetEventProfilingInfo(timing_event1,CL_PROFILING_COMMAND_END,sizeof(cl_ulong),&endtime1,NULL);
time_spent1 = endtime1 - starttime1;
clEnqueueNDRangeKernel(cmdQueue,...,&timing_event2);
clFinish(cmdQueue);
clGetEventProfilingInfo(timing_event2,CL_PROFILING_COMMAND_START,sizeof(cl_ulong),&starttime2,NULL);
clGetEventProfilingInfo(timing_event2,CL_PROFILING_COMMAND_END,sizeof(cl_ulong),&endtime2,NULL);
time_spent2 = endtime2 - starttime2;
clEnqueueNDRangeKernel(cmdQueue,...,&timing_event3);
clFinish(cmdQueue);
clGetEventProfilingInfo(timing_event3,CL_PROFILING_COMMAND_START,sizeof(cl_ulong),&starttime3,NULL);
clGetEventProfilingInfo(timing_event3,CL_PROFILING_COMMAND_END,sizeof(cl_ulong),&endtime3,NULL);
time_spent3 = endtime3 - starttime3;
time_spent_all_0 = time_spent1 + time_spent2 + time_spent3;
time_spent_all_1 = endtime3 - starttime1;
Run Code Online (Sandbox Code Playgroud)
如果我有每个clFinish,所有分析值都是合理的,但time_spent_all_1大约是time_spent_all_0的2倍.如果我删除除最后一个clFinish之外的所有clFinish,则所有分析值都是不合理的.
感谢Eric Bainville,我得到了我想要的结果:通过一个clFinish分析多个clEnqueueNDRangeKernel.以下是我使用的最终代码:
clEnqueueNDRangeKernel(cmdQueue,...,&timing_event1);
clEnqueueNDRangeKernel(cmdQueue,...,&timing_event2);
clEnqueueNDRangeKernel(cmdQueue,...,&timing_event3);
clFinish(cmdQueue);
clGetEventProfilingInfo(timing_event1,CL_PROFILING_COMMAND_START,sizeof(cl_ulong),&starttime,NULL);
clGetEventProfilingInfo(timing_event3,CL_PROFILING_COMMAND_END,sizeof(cl_ulong),&endtime,NULL);
time_spent = endtime - starttime;
Run Code Online (Sandbox Code Playgroud) 我正在为项目编写CMakeLists.txt,并遇到set_source_files_properties的问题.
原始工作表达式是:
set_source_files_properties (a.cpp PROPERTIES COMPILE_DEFINITIONS
DIR1="/home/xxx/b.i")
Run Code Online (Sandbox Code Playgroud)
然后我尝试添加更多COMPILE_DEFINITIONS,但是失败了.
尝试1:
set_source_files_properties (a.cpp PROPERTIES COMPILE_DEFINITIONS
DIR1="/home/xxx/b.i" DIR2="/home/xxx/c.i" DIR3="/home/xxx/d.i")
Run Code Online (Sandbox Code Playgroud)
尝试2:
set_source_files_properties (a.cpp PROPERTIES COMPILE_DEFINITIONS
DIR1="/home/xxx/b.i")
set_source_files_properties (a.cpp PROPERTIES COMPILE_DEFINITIONS
DIR2="/home/xxx/c.i")
set_source_files_properties (a.cpp PROPERTIES COMPILE_DEFINITIONS
DIR3="/home/xxx/d.i")
Run Code Online (Sandbox Code Playgroud)
结果:只有最后定义DIR3可以在make编译时在a.cpp中识别,前两个在make阶段报告未定义.
有什么建议?
谢谢!