Design Article
Comment
WKetel
This article unintentionally offers another solution, which has two parts, the ...
sharps_eng
Interestingly this article contains 6 or more arguments ended by a 'counsel of ...
Hunting that elusive bug
Roland Syba, Melexis GmbH
9/28/2010 3:44 AM EDT
This article discusses the non-trivial challenge of detecting and correcting the – often elusive – functional defects that unavoidably arise in the design of complex system-on-chip (SoC) devices. How do we mitigate the conflict between the dramatic increase in SoC design complexity and the need to deliver the design in a shorter time with the same or better design quality?
Clearly, we need new design and verification methods, and we need “all hands on deck” to develop them. That’s why a consortium of six companies and six research institutes set up the Herkules project [1], with support from the German government. Over the past three years, this project teamed design and verification engineers from leading chip companies and developers of commercial verification tools from EDA companies, together with leading technology research institutes. Their goal was to develop a right-first-time verification approach for large digital and mixed-signal designs – and to ensure that it is widely applicable to the development of automotive and telecommunication systems that must comply with very high quality standards.
And why is Melexis interested in these verification issues? We develop and produce a broad range of mixed-signal, high voltage ASICs and ASSP for automotive applications, increasingly equipped with integrated flash and microcontroller(s), local interconnect network (LIN) physical layer or complete LIN controller, and controller area network (CAN) components. In common with most providers of complex SoCs, we implement a re-usable intellectual property (IP) methodology to speed time to market and reduce development costs.
For us, it is of vital strategic importance that this IP is free of error and malfunction, and that we achieve this quality as early in the design flow as possible. A thoroughly simulated IP block with extensive verification coverage may sound good. But integrate several such IP blocks, and experience from many past projects shows that the probability that the ensemble will fail because of undetected bugs increases significantly. In the automotive world, the consequence could be extremely costly recalls – or worse.
The challenges addressed by the Herkules project were: How can we achieve the confirmed absence of functional errors in automotive and communications IP well before tape-out? How do we know with certainty that we’ve detected and corrected every error? How do we know that our specifications are complete and free of ambiguities?
In this article we address the technical and business implications of failing to adequately answer these questions, and cover the essential ingredients of a verification approach – newly developed within the Herkules project – that delivers the requisite certainty.
Verification holes – The implications
Every designer knows the situation. The design team finishes the design of the “perfect” chip – perfect in its view, anyway – and starts full blown development on the next design. Months later, the first problems surface – problems that the team thinks shouldn’t even exist. After all, it’s an experienced team, which has simulated the design with state-of-the-art methods, with test patterns so crammed with coverage that the return rate should be nearly negative. And still customer returns show up. But why? Is it a defective chip? Is the customer applying the chip incorrectly? Is there a use case that violates the specification? Or is it perhaps a combination of any or all of these?
This nasty surprise has become an industry dilemma. End-user applications demand ever more on-chip processing performance, with ever more cost-effective wafers, in ever smaller process technologies, loaded with ever larger memories, and integrating ever more functionality – additional functionality that previously often occupied an entire discrete chip of its own.
And it’s not enough that the chip looks increasingly like a motherboard. To service different application requirements, the chip often integrates entire groups of functionality with reconfigurable implementations – to simplify supply logistics and, above all, to reduce development costs. But this nesting of complex functionality is not the only source of exploding complexity. In comparison to a multi-million-gate data processing design with high levels of parallelism, a multi-function chip with a much lower gate count can actually be more complex. For example, it may have entire register banks, every single bit of which can control any of the multifarious functions that the chip is required to fulfill. So, the number and intricacy of internal dependencies is huge (see figure 1) – and the likelihood of missed bugs high.
Figure 1: This “small control flow” illustrates the intricacy of internal dependencies
Next: .



STS_SK
9/29/2010 12:39 AM EDT
Good reminder of problem faced in SOC integration
Sign in to Reply
MikeLC
9/30/2010 4:14 AM EDT
As a programmer who has done a fair amount of client/server software, it certainly is important for the hardware to be reliable. Keep up the good work.
Sign in to Reply
sharps_eng
10/1/2010 4:02 PM EDT
Interestingly this article contains 6 or more arguments ended by a 'counsel of despair' typified by 'its so complex you'll never find all the bugs, so why bother?' then describes a brute-force method of testing to try to find said bugs, and _prove_their_absence_(I'll not go there!). Reliable design used to start by designing a core so simple it could be proven to be correct using tools that themselves were simple enough to be proven likewise. Thereafter, this virtuous circle was widened step by step to encompass the entire application, with each proven building block introducing no additional uncertainty save that due to the current layer being developed. But before the impetus had built, or critical mass achieved, commercial forces would often kill the project or force it to market early, and the work done to date would be lost, instead of forming the core of future systems. It would be fascinating to be a code archeologist and unearth some ultra-safe, elegant and incredibly simple systems that have been scrapped without regard to their potential value. Real 'lost knowledge', worthy of a Dan Brown novel (and then some).
Sign in to Reply
WKetel
11/3/2010 5:38 PM EDT
This article unintentionally offers another solution, which has two parts, the first being to not make the system so incredibly complex. Don't put all of a cars smart functions on one chip, don't even put them in one module. It is much more likely to be able to understand what code is doing if it is only doing one thing. So coming up with an adequately detailed description of what the system will do is a very good start.
The second part of the solution is after a function is selected for a given system, carefully define all of what the function must do, and then make that description be the specification, and, most importantly, avoid feature creep. Both in initial concept creation and as the project is in process, constantly changing and adding is a certified way to add errors.
Sign in to Reply