ein*_*ica -1 benchmarking profiling cuda clock
我想对几个 CUDA 内核进行一些比较分析。但是,其中一个在一个程序中运行,该程序为 GPU 加载更多工作,而另一个仅在测试工具中运行。
对于某些 GPU,这些情况意味着时钟频率会发生变化(可能不止一种时钟频率,因为有多种)。这种影响在像 Tesla T4 这样的设备(没有主动冷却)中尤为严重。
是否可以防止时钟速率因负载(或热条件)而改变?
我已经研究过这个nvidia-smi
实用程序,它有一个名为clocks
-的子命令,但它所做的只是以下内容:
clocks -- Control and query clock information.
Usage: nvidia-smi clocks [options]
options include:
[-i | --id]: Enumeration index, PCI bus ID or UUID. Provide comma
separated values for more than one device
[ | --sync-boost-list]: List all synchronous boost groups
[ | --sync-boost-add]: Add a synchronous boost group
[ | --sync-boost-remove]: Remove a synchronous boost group. Provide the group id
returned from --sync-boost-list
Run Code Online (Sandbox Code Playgroud)
......看起来这不是我需要的。当然,非nvidia-smi
基于解决方案是受欢迎的。
笔记:
TL;DR
nvidia-smi -i 0 -pm 1
(sets persistence mode for the GPU index 0)nvidia-smi
command like -ac
or -lgc
(application clocks, lock gpu clock)nvidia-smi
command line help for all of this nvidia-smi --help
LONGER:
I'm using driver 455.23.05 for this description. Some features (e.g. -lgc
) may not be available in older drivers. Setting persistence mode may be necessary for some of these features, and will also help to reduce variability on application start-up. This is not intended to be an exhaustive description of the nvidia-smi
tool.
SETTING APPLICATION CLOCKS:
The application clocks feature should generally be useful for the testing described. It will not force the GPU clocks to remain at the specified setting when there is no application running (AFAIK), but the clocks should attain those values "as soon as" the application starts running. It allows you to specify both gpu clock (i.e. core clock) as well as memory clock. Let's start by excerpting the command line help text for some of the important switches:
-ac --applications-clocks= Specifies <memory,graphics> clocks as a
pair (e.g. 2000,800) that defines GPU's
speed in MHz while running applications on a GPU.
-rac --reset-applications-clocks
Resets the applications clocks to the default values.
-acp --applications-clocks-permission=
Toggles permission requirements for -ac and -rac commands:
0/UNRESTRICTED, 1/RESTRICTED
Run Code Online (Sandbox Code Playgroud)
To get started setting application clocks, you may need to use sudo
or similar on linux for some or all of these commands. Also note above the requirement for elevated privilege can be turned on/off. Also important is that you cannot pick any values you like for <memory,graphics>
settings pair. You must specify a pair, and furthermore the pair can only come from a list of permissible options. Other choices will result in unspecified behavior. These choices can be determined from the --query-supported-clocks
switch (use --help-query-supported-clocks
to get command-line help on that switch) to nvidia-smi
which itself requires some formatting. For example, the following command will give an exhaustive list of the valid pairs that can be passed to the -ac
command:
nvidia-smi -i 0 --query-supported-clocks=mem,gr --format=csv
Run Code Online (Sandbox Code Playgroud)
Once you have that list of valid pairs, you can specify one of those pairs to the application clocks command:
nvidia-smi -i 0 -ac 877,1215
Run Code Online (Sandbox Code Playgroud)
(The above command, if run with root or enabled via -acp
would set the memory clock to 877MHz and the core clock to 1215MHz on my Tesla V100, for example. Note the -i
switch to select the GPU to target with this command. The 877,1215 pair may not be valid on your GPU. Also note that the -acp
feature is removed from drivers 465.xx and newer.)
When you are done with whatever you are doing, you may wish to reset the application clock behavior to the default behavior (GPU selects clock freqs according to its own heuristics) using -rac
.
Also, a number of the pairs offered may involve "boosting" behavior. The GPU is not guaranteed to maintain all clocks exactly as you specify, if a throttling event occurs. Typical throttling events are:
The existence of an actual throttling event can be discovered using the "full" output from nvidia-smi (nvidia-smi -a
), look for "clocks throttle reasons". Other useful information is available in this output such as the default application clocks. When N/A
appears in your output, it means that your GPU does not support this feature. There is a great variety of supported features across various GPU families, I won't be able to respond to questions about this.
In the absence of a throttling event, and assuming your GPU supports the feature, I would expect application clocks to remain in effect throughout your application runtime. Note that if this command is specified while an application is currently running, the change in clocks may not take effect until the GPU becomes idle. You may wish to monitor GPU clocks in this case (again, using nvidia-smi
). Therefore I would generally recommend using these commands when the GPU is idle. Then begin your work on the GPU after that.
LOCK GPU (CORE) CLOCK:
In many cases, the gpu core clock (core, gpu, graphics are all synonyms in this context) exhibits the most variability (for example the application clocks offered on my Tesla V100 only include a value of 877MHz for memory clock; no other choices are possible). There is a separate switch that can be used to "lock" the GPU core clock to a range of values.
-lgc --lock-gpu-clocks= Specifies <minGpuClock,maxGpuClock> clocks as a
pair (e.g. 1500,1500) that defines the range
of desired locked GPU clock speed in MHz.
Setting this will supercede application clocks
and take effect regardless if an app is running.
Input can also be a singular desired clock value
(e.g. <GpuClockValue>).
-rgc --reset-gpu-clocks
Resets the Gpu clocks to the default values.
Run Code Online (Sandbox Code Playgroud)
This range is specified using a lower and upper endpoint for the range. If you wish to select a specific value only, you can specify the lower and upper endpoints both to be that value. As far as I know the range endpoints are inclusive.
For example, the following command:
nvidia-smi -i 0 -lgc 1215,1215
Run Code Online (Sandbox Code Playgroud)
will "lock" the GPU core clock to 1215 MHz on my Tesla V100 GPU. As far as I know, this effect takes place immediately, even if an application is running. Most other caveats I can think of should be similar for application clocks:
--query-supported-clocks
command-rgc
As indicated in the help, this switch "overrides" previous application clocks settings with respect to core clock. Also, note that many switches come in 2 flavors, a "long" form and a "short" form. Where additional switch parameters are required, the long form often requires an =
separator, the short form often requires a space separator:
nvidia-smi -i 0 -lgc 1215,1215
Run Code Online (Sandbox Code Playgroud)
or
nvidia-smi -i 0 -lock-gpu-clocks=1215,1215
Run Code Online (Sandbox Code Playgroud)
you generally cannot intermix this formatting:
nvidia-smi -i 0 -lgc=1215,1215
Run Code Online (Sandbox Code Playgroud)
will probably report an error.
A FINAL NOTE:
This effect is particularly severe in devices like Tesla T4's (which aren't actively cooled).
In my experience with T4, a possible observation is throttling. The T4 GPU is one of the lowest power datacenter-grade GPUs, and its certainly possible for the GPU compute demands to exceed what the power limits (70W) can support. In this case, the GPU clocks will throttle, and none of the above commands will allow you to override this behavior. By design, you cannot force the GPU to operate at elevated clocks when the GPU is trying to protect itself, or protect the system it is running in.
此外,T4 没有主动冷却的事实真的无关紧要。T4 唯一批准/支持的使用设置是在旨在处理 T4 的服务器中。(类似的说法适用于任何 NVIDIA 数据中心 GPU)。此类服务器可监控 T4 GPU 温度并为 GPU 提供服务器提供的强制流通冷却。这是设计使然。服务器负责将 GPU 保持在适当的温度操作范围内。如果服务器没有这样做,您应该与您的服务器供应商解决这个问题。如果您在未经批准的设置(例如不合格的服务器或台式机/工作站)中操作 T4 GPU,那么我通常希望使用该设备的体验令人沮丧。