为什么ARM汇编中的内存指令需要4个周期？

Question

为什么ARM汇编中的内存指令需要4个周期？

kyc*_*won 4 performance assembly arm cpu-cycles cpu-architecture

诸如ldr、str或之类的内存指令b在 ARM 汇编中各占用 4 个周期。

是因为每个内存位置都是4字节长吗？

Answer 1

ARM 具有流水线架构。每个时钟周期使管道前进一步（例如获取/解码/执行/读取...）。由于流水线是连续馈送的，执行每条指令的总时间可以接近 1 个周期，但单个指令从“取指”到完成的实际时间可以是 3 个以上周期。ARM 在他们的网站上有很好的解释：

http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0222b/ch01s01s01.html

内存延迟使这个想法变得更加复杂。ARM 采用多级缓存系统，旨在以最少的周期提供最常用的数据。即使从最快 (L0) 缓存进行读取也会产生几个周期的延迟。如果不立即使用数据，管道包括允许稍后完成读取请求的设施。通过例子更容易理解：

LDR R0,[R1]
MOV R2,R3    // Allow time for memory read to occur
ADD R4,R4,#200  // by interleaving other instructions
CMP R0,#0  // before trying to use the value

// By trying to access the data immediately, this will cause a pipeline
// 'stall' and waste time waiting for the data to become available.
LDR R0,[R1]
CMP R0,#0 // Wastes at least 1 cycle due to pipeline not having the data

Run Code Online (Sandbox Code Playgroud)

这个想法是隐藏管道中固有的延迟，如果可以的话，通过延迟对寄存器的依赖（也称为指令交错）来隐藏内存访问中的额外延迟。

归档时间：	9 年前
查看次数：	6277 次
最近记录：	5 年，11 月前