使用周期计数寄存器(DWT_CYCCNT)来获得高精度计时!
注意:我还使用数字引脚和示波器对此进行了测试,并且非常准确。
请参阅stopwatch_delay(ticks) 和下面的支持代码,该代码使用 STM32 的 DWT_CYCCNT 寄存器,该寄存器专门用于计算实际时钟周期,位于地址 0xE0001004。
请main参阅一个示例,该示例使用STOPWATCH_START/STOPWATCH_STOP来测量实际花费的时间stopwatch_delay(ticks),使用CalcNanosecondsFromStopwatch(m_nStart, m_nStop).
修改ticks输入进行调整
uint32_t m_nStart; //DEBUG Stopwatch start cycle counter value
uint32_t m_nStop; //DEBUG Stopwatch stop cycle counter value
#define DEMCR_TRCENA 0x01000000
/* Core Debug registers */
#define DEMCR (*((volatile uint32_t *)0xE000EDFC))
#define DWT_CTRL (*(volatile uint32_t *)0xe0001000)
#define CYCCNTENA (1<<0)
#define DWT_CYCCNT ((volatile uint32_t *)0xE0001004)
#define CPU_CYCLES *DWT_CYCCNT
#define CLK_SPEED 168000000 // EXAMPLE for CortexM4, EDIT as needed
#define STOPWATCH_START { m_nStart = *((volatile unsigned int *)0xE0001004);}
#define STOPWATCH_STOP { m_nStop = *((volatile unsigned int *)0xE0001004);}
static inline void stopwatch_reset(void)
{
/* Enable DWT */
DEMCR |= DEMCR_TRCENA;
*DWT_CYCCNT = 0;
/* Enable CPU cycle counter */
DWT_CTRL |= CYCCNTENA;
}
static inline uint32_t stopwatch_getticks()
{
return CPU_CYCLES;
}
static inline void stopwatch_delay(uint32_t ticks)
{
uint32_t end_ticks = ticks + stopwatch_getticks();
while(1)
{
if (stopwatch_getticks() >= end_ticks)
break;
}
}
// WARNING: ONLY VALID FOR <25ms measurements due to scaling by 1000!
uint32_t CalcNanosecondsFromStopwatch(uint32_t nStart, uint32_t nStop)
{
uint32_t nDiffTicks;
uint32_t nSystemCoreTicksPerMicrosec;
// Convert (clk speed per sec) to (clk speed per microsec)
nSystemCoreTicksPerMicrosec = CLK_SPEED / 1000000;
// Elapsed ticks
nDiffTicks = nStop - nStart;
// Elapsed nanosec = 1000 * (ticks-elapsed / clock-ticks in a microsec)
return 1000 * nDiffTicks / nSystemCoreTicksPerMicrosec;
}
void main(void)
{
int timeDiff = 0;
stopwatch_reset();
// =============================================
// Example: use a delay, and measure how long it took
STOPWATCH_START;
stopwatch_delay(168000); // 168k ticks is 1ms for 168MHz core
STOPWATCH_STOP;
timeDiff = CalcNanosecondsFromStopwatch(m_nStart, m_nStop);
printf("My delay measured to be %d nanoseconds\n", timeDiff);
// =============================================
// Example: measure function duration in nanosec
STOPWATCH_START;
// run_my_function() => do something here
STOPWATCH_STOP;
timeDiff = CalcNanosecondsFromStopwatch(m_nStart, m_nStop);
printf("My function took %d nanoseconds\n", timeDiff);
}
Run Code Online (Sandbox Code Playgroud)
更新:在评论部分添加 @vgru 提到的简洁解决方案
// general but accurate (5% err at 10us delay, but 22% err at 1us delay)
#pragma GCC push_options
#pragma GCC optimize ("O3")
void delayUS_DWT(uint32_t us) {
volatile uint32_t cycles = (SystemCoreClock/1000000L)*us;
volatile uint32_t start = DWT->CYCCNT;
do {
} while(DWT->CYCCNT - start < cycles);
}
#pragma GCC pop_options
Run Code Online (Sandbox Code Playgroud)
还在@vgru 的同一链接中添加最准确但不灵活的 ASM 解决方案
// most accurate but the '16' needs to be adjusted if <84MHz
#define delayUS_ASM(us) do {\
asm volatile ( "MOV R0,%[loops]\n\t"\
"1: \n\t"\
"SUB R0, #1\n\t"\
"CMP R0, #0\n\t"\
"BNE 1b \n\t" : : [loops] "r" (16*us) : "memory"\
);\
} while(0)
Run Code Online (Sandbox Code Playgroud)
如果您需要非常短但确定性的“至少”延迟,也许您可以考虑使用其他指令,而不是nop确定性非零延迟。
如上所述的Cortex-M4 NOP不一定很耗时。
您可以将其替换为例如and reg, reg,或nop与上下文中的a大致等效的东西。另外,切换GPIO时,您也可以重复I / O指令本身以强制达到最小状态长度(例如,如果GPIO写入指令至少花费5ns,则重复五次以获得至少25ns)。如果要在C程序中插入点,这甚至可以在C中很好地工作(只要重复对端口的写入,如果确实volatile如此,编译器就不会删除重复的访问)。
当然,这仅适用于非常短的延迟,否则,对于短暂的延迟(如其他人所述),等待某个时序源的繁忙循环会更好地工作(它们至少需要采样时序源,设置目标,并经过一次等待循环)。