NVMe 命令位于 PCIe BAR 内的什么位置?

twe*_*eak 3 driver pci-e nvme

根据NVMe规范,BAR对于每个队列都有尾部和头部字段。例如:

  • 提交队列y尾部门铃 ( SQyTDBL):
    • 开始:1000h + (2y * (4 << CAP.DSTRD))
    • 结尾: 1003h + (2y * (4 << CAP.DSTRD))
  • 提交队列y头门铃 ( SQyHDBL):
    • 开始:1000h + ((2y + 1) * (4 << CAP.DSTRD))
    • 结尾: 1003h + ((2y + 1) * (4 << CAP.DSTRD))

是否有队列本身或只是指针?它是否正确?如果是队列,我会假设 DSTRD 表示所有队列的最大长度。

此外,该规范还讨论了两个可选区域:主机内存缓冲区(HMB)和控制器内存缓冲区(CMB)。

  • HMB:主机 DRAM 内的区域(PCIe 根)
  • CMB:NVMe 控制器 DRAM 内的区域(SSD 内)

如果两者都是可选的,那么它位于哪里?由于端点 PCIe 仅适用于 BAR 和 PCI 接头,因此除了 BAR 之外,我看不到它们可能位于任何其他位置。

小智 6

Sorry but I am doing this from memory but I have implemented an FPGA NVMe host so hopefully my memory will be enough to answer your questions and more, if I get something wrong though at least you know why. I'll be providing reference sections from the specification which you can find here. https://nvmexpress.org/wp-content/uploads/NVM-Express-1_4-2019.06.10-Ratified.pdf Also as a note before I really answer your question I want to clarify some confusion, understanding the spec takes some time I honestly recommend reading it bottom to top the last few sections help give context for the first few as strange as that sounds.

  1. These are the submission and completion queues, specifically the subqueue tail and completion queue head respectively (SECTION 3.1). More on this later I just wanted to correct the missconception that you access the submission queue head as the host, you do not only the controller (traditionally the drive) does. A simple reminder submission is you asking the drive to do something, completion is the drive telling you how it went. Read SECTION 7.2 for more info.

  2. Before you can send anything to these queues you must first setup said queues. Baseline in the system these queues do not exist, you must use the admin queue to set them up.

28h 2Fh ASQ Admin Submission Queue Base Address

30h 37h ACQ Admin Completion Queue Base Address

  1. Your statement about DSTRD is a huge miss understanding. This field is from the capabilities register (0x0) Figure 3.1.1. This field is the controller (drive) telling you the "doorbell stride" which says how many bytes are between each doorbell, I've never seen a drive report anything but 0 for this value since well, why would you want to leave dead space between doorbell registers.

  2. Please be careful with the size of your writes, in my experience most NVMe drives require you to send writes of at least 2dwords (8 bytes) even if you only intend to send 1dword of data, just a note.

  3. Onto actually helping you use this thing as a host, please reference SECTION 7.6.1 to find the initialization sequence. Notice how you must setup multiple registers, read certain parameters and other such things.

  4. Assuming you or someone else has done initalization let me now answer the core of your question, how to use these queues. The thing is, this answer spans MANY sections of the spec and is the core of it. So with that I am going to break it down as best I can for a simple write command. Please note you CANNOT write, until you have first created the queues using the admin queues which leverage different opcodes from a different section of the spec, sorry I cannot write all of this out.

STEPS TO WRITING DATA TO AN NVMe DRIVE.

  1. In the creation of the submission queue you will specify the size of this specific queue. This is the number of commands that can be placed in the queue at one time for processing. Along with this you will specify the queue base address. So for this example let's assume you set the base address to 0x1000_0000 and size 16 (0x10). Figure 105 let's us know that every submission queue entry has a size of 64bytes (0x40) so queue entry 0 is at 0x1000_0000 entry 1 is at 0x1000_0040 2 0x1000_0080 and so on for our 16 entries then it loops back.

  2. You will first store data for writing, let's say you were given 512bytes (0x200) of data to write. So for simplicity you place that data at 0x2000_0000 - 0x2000_0200.

  3. You create the submission queue command. This is not a simple process. I'm not going to document all of this for you but understand you should be referencing Figure 104, Figure 346, and Section 6.15. This is not enough however. You will also need to understand PRP vs SGL and which you are using (PRP is easier to start with). NLB (Number of logical blocks) which determine your write size, with NVMe you do not specify writes in bytes but in terms of NLBs which the size is specified by the controller (drive), it may implement multiple NLB sizes but this is up to the drive not you as the host, you just get to pick from what it supports Section 5.15.2.1, Figure 245 You want to look at identify namespace to tell you the LBA (logical block address) size, this will lead you down a rabbit hole to determine the actual size but that's ok the info is there.

  4. Ok so you finished this mess and have created the submission command. Let's assume the host has already completed 2 commands on this queue (at start this will be 0 I'm picking 2 just to be clearer in my example). What you now need to do is place this command at 0x1000_0080.

  5. Now let's assume this is queue 1 (from the equation you posted the queue number is the y value. Note that queue 0 is the admin queue). What you need to do is poke the controllers submission queue tail doorbell to say how many commands are now loaded (thus you can queue multiple up at once and only tell the drive when you are ready to). In this case the number is 2. So you need to write the value 2 to register 0x1008.

  6. 此时驱动器将启动。啊哈,主机告诉我有新的命令要获取。因此控制器将转到队列基地址 + 命令大小*2 并获取 64 字节数据,即 1 个命令(地址 0x1000_0080)。控制器会将此命令解码为写入,这意味着控制器(驱动器)必须从某个地址读取数据并将其放入内存中指定的位置。这意味着您的写入命令应该告诉驱动器转到地址 0x2000_0000 并读取 512 字节的数据,如果您确定 PCIe 总线的范围,就会如此。此时,驱动器将填写一个完成队列条目(第 4.6 节中指定的 16 个字节)并将其放置在您在队列创建时指定的完成队列地址中(加上 0x20,因为这是第二次完成)。然后控制器将生成 MSI-X 中断。

  7. 此时,您必须转到完成队列所在的位置并读取响应以检查状态,并且如果您对多个提交进行排队,请检查 SQID 以查看完成的内容,因为作业可能会无序完成。然后,您必须写入完成队列头 (0x100C) 以指示您已检索完成队列(成功或失败)。请注意,这里您永远不会与提交队列头交互(这取决于控制器,因为只有他知道提交队列条目何时被处理),并且只有控制器将事物放入完成队列尾部,因为只有他可以创建新条目。

很抱歉这篇文章太长而且格式不太好,但希望您现在对 NVMe 有了更好的了解,一开始有点混乱,但一旦您理解了,一切就都有意义了。请记住我的示例假设您创建了一个基线不存在的队列。首先,您需要设置管理提交和完成队列(0x28 和 0x30),其队列 ID 为 0,因此它的尾/头门铃地址分别为 0x1000、0x1004。然后,您必须参考第 5 节来找到操作码来使事情发生,但我相信您可以从我给您的内容中找出答案。如果您还有其他问题,请发表评论,我会看看我能做些什么。