测试不同级别的FPGA设计

Question

测试不同级别的FPGA设计

las*_*und 9 testing verilog fpga vhdl

这里讨论了FPGA的测试策略的各个方面,但是我发现以下问题没有被提出/讨论/回答:

您应该在什么级别模拟FPGA设计,以及在每个级别验证什么？

如果您使用x级测试等概念回答,其中x =块,子系统,函数或其他内容,请描述x适合您的内容.像典型的大小,复杂性或一个例子.

9月14日

当涉及到实际问题时,两个给出的答案都是相同的,但我会接受来自@kraigher的答案,因为它是最短的答案.

9月10日

这是@Paebbles和@kraigher的两个答案的总结和比较.其中一个答案很长,所以希望这将有助于任何想要为自己的答案做出贡献的人.请记住,这是一个有利可图的赏金!

它们都模拟了所有级别的所有组件.至少@Paebbles会对功能含量很少的组件(例如MUX)进行例外处理.
他们都在努力实现测试自动化
他们都开发了"工具"来简化板级测试
他们都避免在一个级别测试已经在下面级别测试过的东西
最大的区别似乎在于测试平台的模拟频率.@Paebbles直接在HW中进行测试,除非有重大的设计更改,在这种情况下也会运行模拟.@kraigher随着设计的发展不断运行模拟.我认为这也是一个非常重要的问题,我个人更喜欢@kraigher表达的方式.但是,这不是原始问题的一部分,所以我认为这两个答案之间存在共识.关于测试运行频率的问题之前也已经讨论过,例如,系统单元测试的整个套件应该多久运行一次？

他们在实验室中进行了多少测试,但这似乎主要与项目的具体情况有关(有多少事情无法通过模拟进行有效测试).我碰巧知道@kraigher的最后一个项目,所以我可以说这两个项目都属于1年以上的类别.从一个规模较小的人那里听到一个故事会很有趣.从我所看到的所有项目来看,在模拟中的功能覆盖方面都是完整的,因此必须有其他故事.

9月7日

这是@peabbles的一些后续问题太长,无法在评论中使用.

是的@peabbles,你已经提供了很多我正在寻找的内容,但我还有其他问题.我担心这可能是一个冗长的讨论,但考虑到我们花在验证上的时间以及人们应用的各种策略,我认为它值得关注.希望我们会有更多的答案,以便可以比较各种方法.你的赏金肯定会有所帮助.

我认为你的故事包含许多有趣且有趣的解决方案,但我是工程师,所以我会专注于我认为可以挑战的作品;-)

您花了很多时间在硬件上进行测试,以解决您遇到的所有外部问题.从实际的角度来看(因为他们不会修复他们的SATA标准违规),这就像有一个有缺陷的需求规范,这样你就可以开发一个解决错误问题的设计.这通常是在您"交付"时发现的,这会激发您为什么要经常交付并尽早发现问题.我对一件事感到好奇.当您在实验室中发现需要进行设计更改的错误时,您是否会在可以测试的最低级别更新测试平台？不这样做会增加错误在实验室中重新出现的风险,并且随着时间的推移它也会降低测试平台的功能覆盖率,使您更加依赖于实验室测试.

你说大多数测试是在实验室完成的,这是由你必须调试的外部问题造成的.如果你只看你自己的内部代码和错误,你的答案是否相同？

当您像往常一样工作很长的周转时间时,您会找到各种方法来利用这段时间.您描述了在第一次测试时开始合成下一个设计,并且如果您在一个驱动器中发现了一个错误,那么您开始为该驱动器合成修复,同时继续使用当前设计测试其他驱动器.您还在实验室中进行了测试时描述了可观察性问题.我将对此做一些持怀疑态度的解释,你必须提供积极的解释!

如果您在开始测试第一个设计时可以立即合成下一个设计,那么您似乎正在以非常小的增量进行工作,但仍然努力在每个级别运行每个测试一直到硬件.这似乎有点过分/昂贵,特别是当您没有完全自动化硬件测试时.另一个持怀疑态度的解释是,你正在寻找一个错误,但由于可观察性差,你正在制作随机试验和错误类型的构建,希望它们能够提供你试图孤立的问题的线索.这是否真正有效地利用了时间,因为每一个构建都增加了价值,还是更"做某事比做什么更好"？

在设计更高的协议层时,您是否考虑将更高层次的通信堆栈短路以加速仿真？毕竟,已经测试了下层.

您重用了一些组件并假设它们没有bug.那是因为他们配有测试平台,证明了这一点吗？使用证明往往很弱,因为重用通常发生在另一个环境中.Arianne 5火箭就是一个很好的例子,你可以再次将XAPP 870用于Virtex 5.

由于您可以在不同级别进行模拟,因此我认为您可以在较低级别使用较快的运行时间,并且可以在较大的结构完成之前验证设计的一部分时使用较短的反馈循环.仍然有一些代码片段足够重要,可以授予他们自己的组件,但仍然太简单,无法获得自己的测试平台.你能给出这样一个组件的例子吗？他们真的没有bug吗？我个人在犯错误之前没有写过很多代码行,所以如果我有一个很好的代码片段就像一个组件,我会抓住机会在上面提到的原因进行测试.

Answer 1

Pae*_*els 9

串行ATA控制器故事

我将尝试通过示例解释我的测试策略.

简介:
我为我的最终学士学位项目开发了一个Serial-ATA控制器,在我毕业后的几个月里,这个项目发展成为一个非常庞大的项目.测试要求变得更加困难,因为每个新的bug或性能缺乏都很难找到,所以我需要更聪明的工具,策略和解决方案来进行调试.

开发步骤:

Phase 1: A ready to use IP Core example
I started on a Virtex-5 platform (ML505 board) and a Xilinx XAPP 870 with example code. Additionally, I got the SATA and ATA standards, the Xilinx user guides, as well as 2 test drives. After a short period, I noticed that the example code was mostly written for a Virtex-4 FPGA and that the CoreGenerator generated invalid code: unconnected signals, unassigned inputs, false configured values regarding the SATA specification.

Rule #1: Double check generated code lines, they may contain systematic faults.

阶段2:收发器代码的完全重写和新物理层的设计
我开发了新的收发器和物理层来执行基本的SATA握手协议.在我写单身汉报告时,没有好的GTP_DUAL收发器模拟模型,我没有时间自己编写.所以我在真实硬件上测试了一切.可以模拟收发器,但是OOB握手协议所需的电气IDLE条件没有实现或不起作用.在我完成报告后,Xilinx更新了仿真模型,我可以模拟握手协议,但不幸的是一切都在运行(参见第5阶段).

如何在不模拟的情况下测试FPGA硬宏？
幸运的是,我有一个三星Spinpoint HDD,只有在有效的握手序列之后才能启动.所以我有一个声响应.

FPGA设计配备了大型ChipScope ILA,它使用98%的BlockRAM来监控收发器的行为.这是唯一可能猜测高速串行线正在发生什么.我们遇到了其他无法解决的困难:

我们没有能够处理1.5和3.0 GHz信号的示波器.
在线上添加探针很困难(反射,......)
我是计算机科学家没有高频电气工程师:)

规则#2:如果您的设计还剩下空间,我们可以让ILA监控设计.

Phase 3: A link layer
After some successful link ups with 2 HDDs I started to design the link layer. This layer has big FSMs, FIFOs, scramblers, CRC generators and so on. Some components like FIFOs were given for my bachelor project, so I assumed these components are bug free. Otherwise I could start the provided simulations by my self and change parameters.

My own sub components were tested by simulation in testbenches. (=> component level tests). After that I wrote an upper layer testbench that could act as host or device, so I was able to build a 4 layer stack:
1. Testbench(Type=Host)
2. LinkLayer(Type=Host)
3. wires with delay
4. LinkLayer(Type=Device)
5. Testbench(Type=Device)

The SATA link layer transmits and receives data frames. So the normal process statement for stimuli generation was quite to much code and not maintainable. I developed a data structure in VHDL that stored testcases, frames and data word inclusive flow control information. (=> subsystem level simulation)

Rule #3: Building a counterpart design (e.g. the device) can help in simulations.

Phase 4: Test the link layer on real hardware
The host-side testbench layer form (3) was written to be synthesizable, too. So I plugged it together:
1. Testbench(Type=Host)
2. LinkLayer(Type=Host)
3. PhysicalLayer
4. TransceiverLayer
5. SATA cables 6. HDD

我将启动序列存储为测试台ROM中的SATA帧列表,并使用ChipScope监视HDD响应.

规则#4:可合成的测试平台可以在硬件中重用.以前生成的ILA也可以重用.

现在是测试不同硬盘驱动器并监控其行为的时间点.经过一段时间的测试后,我可以与少数磁盘和SSD进行通信.一些供应商特定的解决方法被添加到设计中(例如,除了来自WDC驱动器的双COM_INIT响应:))

此时合成需要大约30-60分钟才能完成.这是由中端CPU,> 90%FPGA利用率(BlockRAM)和ChipScope内核中的时序问题引起的.Virtex-5设计的某些部分以300 MHz运行,因此缓冲器git填充速度非常快.另一方面,握手序列可以采用800 us(通常<100 us),但市场上有些设备可以睡650美元直到它们响应!所以我研究了存储资格,交叉触发和数据压缩领域.

在合成运行的同时,我使用不同的设备测试了我的设计,并为每个设备编写了测试结果表.如果合成完成并且Map/P&R非常出色,我会使用修改后的代码重新启动它.所以我在飞行中有几个设计:).

Phase 5: Higher layers:
Next I designed the transport layer and the command layer. Each layer has a standalone testbench, as well as sub component testbenches for complex sub modules. (=> component and subsystem level tests)

All modules were plugged together in a multi-layer testbench. A designed a new data generator so I had not to handcode each frame, just the sequence of frames had to be written.

Testbench(Initiator)
DataGenerator
Commandlayer
TransportLayer
LinkLayer(Type=Host)
wires with delay
LinkLayer(Type=Device)
Testbench(Checker)

I also added a wire delay between the two LinkLayer instances, which was measured before in ChipScope. The checker testbench was the same from above, filled with expected frame orders and prepared response frames.

Rule #5: Some delays let you find protocol/handshake problems between FSMs.

Phase 6: Back to the FPGA
The stack was synthesized again. I changed my ILA strategy to one ILA per protocol layer. I generated the ILA via CoreGenerator, which allowed me to use a new ChipScope core type, the VIO (VirtualInputOutput). This VIO transfers simple I/O operations (button, switch, led) to and from the FPGA board via JTAG. So I could automated some of my testing processes. The VIO was also able to encode ASCII strings, so I decoded some error bits from the design into readable messages. This saved me from searching synthesis reports and VHDL codes. I switched all FSMs to gray encoding, to save BlockRAMs.

Rule #6: Readable error messages save time

**Phase 7: Advancements in ChipScope debugging
Each ILA of a layer had a trigger output put, which was connected to trigger inputs on other ILA. This enables cross-triggers. E.g. it's possible to use this complex condition: trigger in TransportLayer if a frame is aborted, after the LinkLayer has received the third EOF sequence.

Rule #7: Use multiple ILAs and connect their triggers cross-wise.

Complex triggers allow one to isolate the fault without time consuming resynthesis. I also started to extract FSM encodings from synthesis reports, so I could load that extracted data as token files into ChipScope and display FSM states with their real names.

Phase 8: A serious bug
Next, I was confronted with a serious bug. After 3 frames my FSMs got stuck, but I could not find the cause in ChipScope, because everything was OK. I could not add more signals, because a Virtex-5 has only 60 BlockRAMs ... Luckily I could dump all frame transactions from HDD startup until the fault in ChipScope. But, ChipScope could export data as*.vcd dumps.

I wrote a VHDL package to parse and import*.vcd dump files in iSim, so I could use the dumped data to simulate the complete Host <-> HDD interactions.

Rule #8: Dumped inter-layer transfer data can be used in simulation for a more detailed look.

Pause
Till then, the SATA stack was quite complete and passed all my tests. I got assigned to two other projects:

An universal UDP/IPv4/IPv6 stack and
a "remote controllable testdesign controller"

The first project reused the frame based testbenches and the per layer/protocol ILAs. The second project used a 8-bit CPU (PicoBlaze) to build an interactive test controller called SoFPGA. It can be remote controlled via standard terminals (Putty, Kitty, Minicom, ...).

At that time a college ported the SATA controller to the Stratix II and Stratix-IV platform. he had just to exchange the transceiver layer and to design some adapters.

SATA Part II:
The SATA controller should get an upgrade: (a) support 7-Series FPGAs and 6.0 Gb/s transfer speed. The new platform was a Kintex-7 (KC705).

Phase 9:
Testing such big designs with buttons and LEDs is not doable. A first approch was the VIO core from phase 6. So I choose to include the previous developed SoFPGA. I added a I²C controller, which was needed to reprogramm an on-board clock generator from 156.25 to 150 MHz. I also implemented measurement modules to measure transfer rate, elapsed time and so on. Error bits from the controller got connected to the interrupt pin of the SoFPGA and error were displayed on a Putty screen. I also added SoFPGA controllable components for fault injection. For example, its possible to insert bit errors into SATA primitives but not into data words.

With this technique, we could prove a protocol implementation faults in several SATA devices (HDDs and SSDs). Its possible to cause a deadlock in the linklayer FSM of some devices. This is caused by an missing edge in the LinkLayer FSM transition diagram :)

With the SoFPGA approch, it was easy to modify tests, reset the design, report errors, and to even benchmarks.

Rule #9: The usage of a soft core allows you to write tests/benchmarks in software. Detailed error reporting can be done via terminal messages. New test programs can be uploaded via JTAG -> no synthesis needed.

But I made one big mistake

Phase 0: back to the beginning:
My reset network was very very bad. So I redesigned the reset network with the help of two colleges. The new clock network has separate resets for the clock wires and MMCMs as well as stable signals to indicate proper clock signals and frequencies. This is needed because the external input clock is reprogrammed at runtime, SATA generation changes can cause clock divider switching at runtime and reset sequences in the transceiver can cause unstable clock outputs from the transceiver. Additionally, we implemented a powerdown signal, to start from zero. So if out SoFPGA triggers a powerdown/powerup sequence, the SATA controller as new as after programming the FPGA. This safes masses of time!

Rule #0: Implement proper resets to every test behaves in the same way. No reprogramming of the FPGA is needed. Add cross clock circuits! This prevents many random faults.

Notes:

Some sub components from the SATA controller are published in our PoC-Library. There are also testbench packages and scripts to ease testing. The SoFPGA core is published, too. My PicoBlaze-Library project eases the SoC development.

Questions from @lasplund - Part I:

Is it fair to say the your levels of testing are component level (simulation of CRC, complex FSM), subsystem level (simulation of one of your layers), top-level simulation, lab testing w/o SW, lab testing with SW (using SoFPGA)?

Yes, I used component testing for midsize components.Some of them were already ready-to-use to I trusted the developers. Small components were tested in the subsystem level test. I believed in my code, so there was no separate testbench. If one should have an fault, I'll see it in the bigger testbench.
When I started part II of the development, I used top-level testbenches. One the one hand there was a simulation model available, but very slow (it took hours for a simple frame transfer). On the other hand our controller is full of ILAs and the Kintex-7 offers several hundred BlockRAMs. Synthesis takes circa 17 minutes (incl. 10 ILAs, and one SoFPGA). So in this project lab testing is faster than simulation. Many improvements (token files, SoFPGA, cross ILA triggering) eased the debugging process significantly.
Can you give a ballpark figure of how your verification efforts (developing and running tests and debugging on that level) were distributed among your levels?

I think this is hard to tell. I worked 2 years on SATA and one year on IPv6/SoFPGA. I think most (>60%) time was spend in "external debugging". For example:
- Debugging VHDL tools (XST, CoreGenerator, iSim, Vivado, ModelSim, Quartus, GHDL, ...)
  I discovered masses of bugs in these tools, most of them got reported. Some are unsolvable.
- A second big part of time was spend in FPGA-device debugging. I have found several unreported and 'secret/silenced' bugs in the devices (especially in 7-Series FPGAs). After some time you start to believe the device has a bug, the you develop a hardware test only for this bug. You can prove it, but Xilinx ignores all you bug reports ...!
- And than there is the testing of different devices.
  All devices adhere to the SATA specification, but some don't talk to our SATA 'conform' controller. Then you start to try different timings, timeouts, control words, ... until you find the bug in the device. If this is found you start to develop a workaround, but it must also works with all previous tested devices!
With a similar distribution, where do you detect your bugs and where do you isolate the root cause? What I mean is that what you detect in the lab may need simulations to isolate.

So as mentioned before, most testing was lab testing, Espei
What is the typical turnaround time at the different levels? What I mean is the time it takes from when you decide to try something out until you've completed a new test run and have new data to analyze.

Because synthesis takes so long, we used pipelined testing. So while we tested one design on the FPGA, a new one was already synthesizing. Or while one error got fixed and synthesized, we tested the design with other disks (7) and SSDs (2). We created matrices which disk failed and which not.

Most debug solution got invented with a forward look: reuseablility, parameterability, ...

Last paragraph:
It was very hard work to get the Kintex-7 ready for SATA. Several questions were posted, e.g. Configuring a 7-Series GTXE2 transceiver for Serial-ATA (Gen1/2/3). But we could not find a proper configuration for the GTXE2 transceiver. So with the help of our embedded SoFPGA, we developed a PicoBlaze to DRP adapter. Dynamic Reconfiguration Port (DRP) is the interface from FPGA fabric into the transceiver configuration bits. On the one hand we monitored the frequency sliding unit in the transceiver, while adapting to the serial line. On the other hand we reconfigured the transceiver at runtime via SoFPGA, controlled from a putty terminal. We tested > 100 configurations in 4 hours with only 3 synthesis runs. Synthesizing each configuration had cost us weeks...

Questions from @lasplund - Part II:

When you discovered a bug in the lab that needed a design change would you then update the testbenches at the lowest level where this could be tested?

Yes, we update the testbenches to reflect the changed implementation, so we hopefully did not run into the same pitfall again.
You said that most testing was done in the lab and that was caused by the amount of external problems you had to debug. Is your answer the same if you just look at your own internal code and bugs?

I designed the state machines with same safety. For example there is always an others or else case. So if one of the developers (now we are a group of four) adds new states and misses edges or so, these transitions get caught. Each FSM has at least one error state, which is entered by transition faults or sub components reporting errors. One error code is generated per layer. The error condition bubbles to the top most FSM. Depending on the error severity (recoverable, not recoverable, ..) an upper FSM performs recovering procedures or halts. The state of all FSMs plus there error condition is monitor by ChipScope. So in most cases it's possible to discover failures in less than a minute. The tuple of (FSM State; Error Code) mostly identifies the cause very exact, so I can name a module and code line.

We also spend many hours in designing a layer/FSM interaction protocol. We named this protocol/interface Command-Status-Error. An upper layer can monitor a lower layer via Status. If Status = STATUS_ERROR, then Error is valid. An upper layer can control a lower layer by Command.

It's maybe not very resource efficient (LUTs, Regs), but it's very efficient for debugging (time, error localisation).
[...] I'm going to do a number of skeptical interpretations of this, you have to provide the positive ones!

Developing SATA was piecewise a very depressing task. Especially the parameter search for the transceiver :). But we also head good moments:
- The new and working reset/powerdown circuit - no more resets via FPGA reprogramming
- The PicoBlaze/SoFPGA system talking via UART :)
- Reprogramming the SoFPGA at runtime
- Automated remote testing
If you could synthesize the next design immediately when you started to test the first it seems like you were working with very small increments but still made the effort to run every test at every level all the way to hardware. This seems a bit overkill/expensive, especially when you're not fully automated on the hardware testing.

We did not run simulations every time. Just after major changes in the design. While we tested one feature we started to test a new one. It's a bit like wafer production. Current chips host already circuits of the next or after next generation for testing. => pipelining :) One drawback is if an major error occurs, the pipeline must be cleared and each feature must be tested individually. This case was very rare.

The development process was always the question: Can I find the bug/solution in the next 5 days with my current set of tools or should I invest 1-2 weeks designing a better tool with better observability?

So we focused on automation and scripting to reduce human errors. To explain it in detail would burst this answer :). But for example our SoFPGA exports ChipScope token files directly from VHDL. It also updates assembly file at each synthesis run. So if one changes the SoFPGA design all*.psm files are updated (e.g. device addresses).
Another sceptical interpretation is that you're looking for a bug but due to poor observability you are producting random trial and error type of builds hoping that they will give clues to the problem you're trying to isolate. Was this really effective use of time in the sense that every build added value or was it more "doing something is better than doing nothing"?

We got no help from Xilinx regarding the correct GTXE2 settings. The internal design was also mostly unknown. So at some point it was trial-and-error. So the only way was to narrow down the search space.
When designing the higher protocol layers did you consider to short circuit the communication stack on the higher levels to speed up the simulation? After all, the lower layers were already tested.

Yes, after the link layer was done, we spared all lower layers (physical transceiver) to speed up simulation. Just the wire delay was left.
You reu

Answer 2

小智 6

我在各个层面进行行为模拟.这就是所有实体都应该有一个相应的测试平台,旨在实现全功能覆盖.如果实体A,B和C的具体细节已经在其相应的测试平台中单独测试,则它们不必包含在实例D的测试平台中,实体D实例化A,B和C,其应该集中于证明集成.

我还有器件或板级测试,其中实际设计在实际器件或电路板上得到验证.这是因为当模型开始变得不精确并且需要很长时间时,您无法信任设备级仿真.在真实设备中,可以实现测试的时间而不是毫秒.

我试图避免执行任何后综合模拟,除非在设备级测试中发生故障,在这种情况下我执行它以找到综合工具中的错误.在这种情况下,我可以制作一个后合成网表的小包装器,并从行为模拟中重新使用测试平台.

我非常努力地避免任何形式的手动测试,而是依靠测试自动化框架进行模拟和设备级测试,以便可以连续进行测试.

为了自动化模拟,我使用了@lasplund和我自己作为其作者的VUnit测试自动化框架.

归档时间：	10 年，3 月前
查看次数：	1190 次
最近记录：	10 年，2 月前