ASIC Verification: April 2008

Wednesday, April 30, 2008

Guidelines for improving the performance of synthesis

Following are some of the important guidelines to improve the performance of synthesized logic and produce the clean design.

Clock and Reset logic Clock and Reset generation logic for the modules should be kept in one module - Synthesis only once and do not touch. This helps in a clean clock constraints specifications. Another advantage is, the modules which are using these clocks and resets can be constrained using the ideal clock specification.

No glue logic at the top The top module should be used only for connecting various components (modules) together. It should not contain any glue logic.

Module name Module name should be same as the file name and one should avoid describing more than one module or entity in a single file. This avoids any confusion while compiling the files and during the synthesis.

FSM Coding

While coding FSMs, the state names should be described using the enumerated types.

The combinational logic for computing the next state should be in its own process, separate from the state registers.

Implement the next-state combinational logic with a case statement. This helps in optimizing the logic much better and results in a clean design.

Multiplexer Inference A case statement is used for implementing multiplexers. To prevent latch inferences in case statements, the default part of the case statement should always be specified. On the other hand an if statement is used for writing priority encoders. Multiple if statements with multiple branches result in the creation of a priority encoder structure.

Tri-state buffers A tri-state buffer is inferred when a high impedance (Z) is assigned to an output. Tri-state logic is generally not always recommended because it reduces testability and is difficult to optimize, since it cannot be buffered.

Monday, April 28, 2008

SOC Verification 3

In the last post, we saw the traditional SOC verification approach. In this post, we are going to see the unique approaches in SOC verification. SOC verification becomes more complex because of many different kinds of IPs on the chip. A good understanding of the overall application of SOC is essential. The more extensive the knowledge of external interfaces, the more complete the SOC verification will be.

Verification Planning Guidelines

The following should be considered in the verification planning.

External Interface Emulation When you verify the complex SOCs, you should consider the full chip emulation. The external interface of each and every IPs on the SOC as well as the SOC data interfaces should also be examined. This should be performed simultaneously for all cores.

Unit level to Top level SOC designs are built from bottom to top. The truth is that the unit level must be used in any of the design hierarchy imposes a need to verify these modules in any possible scenarios.

Re-Use the verification components As the leaf modules are assembled to create the SOC, many of the leaf module interfaces are internal interfaces between various modules of SOC, and there is no longer need to drive their inputs. However other interfaces are external interfaces to SOC. If the test generators for external interfaces are independent components, then most system level stimuli can be taken as is from the various module environment.

Many components in SOC can work independently and work in parallel with other components. In order to exercise the SOC in corner cases, the tests should be able to describe parallel streams of activity for each component separately.

Integration Monitors The primary focus of SOC verification is on integration. Most bugs appear in the integration b/w blocks. An integration monitor that comes with an IP can be great help to find the integration problem. It can be hooked in to the simulation environment and just run to see any integration violation appears on the monitor. This can save the time dramatically. This kind of IP monitors can bring lot of benefits in quality of SOC.

Coverage It is important tool for identifying areas that were never exercised. Both code and toggle coverage are the first indication for areas that were never exercised. However they never tell you that you achieved the full verification. Functional coverage allows you to define what functionality of the device should be monitored.

Looking at functional coverage reports, you may conclude that certain features were already exercised and focus your efforts on the areas that were neglected. But most significant impact of functional coverage in context of SOC verification is in eliminating the need to write many of the most time consuming and hard to write tests.

Conclusion The main focus of SOC verification needs to be on the integration of the many blocks it is composed of. There is a need for welldefined ways for the IP developer to communicate the integration rules in an executable way, and to help the integrator verify that the IP was incorporated correctly. The complexity introduced by the many hardware blocks and by the software running on the processor points out the need to change some of the traditional verification schemes.

Friday, April 25, 2008

Will we ever get a handle on SoC verification cost?

In a presentation at International Symposium on Quality Electronic Design last week, Mentor Graphics verification and test division GM & VP Robert Hum presented an interesting good-news/bad-news scenario. On the good-news side, Hum said that after years of verification costs gobbling a larger and larger fraction of the total engineering budget for chips, and in fact after years of verification cost rising faster than revenue per design, the rate of increase in verification cost is finally starting to moderate.

http://www.edn.com/blog/1690000169/post/1710023971.html

SOC Verification 2

Traditional SOC Verification

Write a detailed test plan document

We usually write hundreds of directed tests to verify the specific scenarios and all sort of activities a verification engineer can think it is an important. But there are some limitations:

The complexity of SOC is such that, many important specific scenarios in which the bug might hide are never thought of.

As the complexity of SOC increases, it become difficult to write a directed test that reach the goals.

Test Generations

Each directed tests that we write, check the specific scenarios only once. But this is not advisable. Since we need to exercise these specific scenarios with different combination of inputs, then only we can find the hiding bugs. Many of us write a random test case to find the hiding bugs, but these are exercised only at the end of the verification cycle. Though these tests reach most of the unexpected corners, we will be verifying the same scenarios again and again and still tend to miss a lot of bugs. But what we actually need is to focus on the particular area of interest in the design. So ....... We need a generic test generators that can easily directed into areas of interest.

Integration

Test bench development for SOC design requires more efforts than the design itself. Many SOC verification test benches doesn't have a means for verifying the correctness of the integration of various modules. Instead the DUT is exercised as a whole unit. The main draw back to this approach is finding the source of the problems by tracing the signals all the way back to where it originated from takes much time. This leads to the need for integration monitors that could identify integration problems at the source.

Tape out..... Tape out..... Tape out.....

Every design and verification team needs an answer for the Million dollar question...... When are we ready for tape out?

To answer for this question is very tough as the verification quality is very hard to measure. Every one's answer would be different. My answer would be depends on code, branch, expression and toggle coverage, functional coverage and bug rates. To solve this dilemma, there is need for coverage metrics that will measure progress in a more precise way.

To summarize, there is always an element of spray and pray(luck) in verification, we are hoping that we will hit and identify most bugs. In SOCs, where so many independent components are integrated, the uncertainty in results is greater. There are new technologies and methodologies available today that offer a more dependable process, with less praying and less time that needs to be invested. In the next post i'll explain unique approaches in SOC verification.

Tuesday, April 22, 2008

SOC Verification 1

The typical System-On-Chip (SOC) may contain the following components. The processor (ARM or DSP), the processor bus, many peripherals like USB and UART, peripheral bus, the bridge which connects the buses and a Controller. The verification of SOC is a challenging one because of the following reasons.

Integration of various modules : The main focus on verification of SOC is to check the integration between the various modules. The SOC verification engineers assumes that each module was independently verified by the module level verification engineers.

IP block re-use : IP reuse was indeed seen as a way to foster development productivity and output that would eventually offset the design productivity gap. Many companies treat their IPs as an asset.

HW/SW co-verification : An SOC is really ready to ship when the complete application works, not just when hardware simulations pass in regressions. In other words, the ultimate test for a chip is to see it performs its application correctly and completely. That means execute the software together with the RTL. So we need a way to capture both HW and SW activities in the tests we write to verify the SOC.

Some of the SOC bugs might hide in the following areas.

Interactions b/w the various blocks.
Unexpected SW/HW handling

All the challenges above indicates that we need a rigorous verification of each of the SOC components separately. I'll explain the trends in traditional SOC verification methodology in the next post.

Monday, April 21, 2008

Sequence Detector - Solution

The solution for the problem is given here.

Friday, April 18, 2008

Sequence Detector

Design a state machine, that outputs a '1' one and only when two of the last 3 inputs are '1'. For example, if the input sequence is 0110_1110 then the output will be 0011_1101. Assume that the input 'x' is a single bit serial line.

Mod - 10 counter

A Mod-10 counter has 10 possible states, in other words it counts from 0 to 9 and rolls over. Let's take a look at how to build a Mod-10 counter.

The first step is to determine how many flip-flops to use. We will use JK FFs for our design. Since we need 10 states, 4 FFs will be required. The trick is to find a way not to use all of those states. There must be a way to force the counter to stop counting at 9 and roll over to 0. This is where an asynchronous inputs come into play. The asynchronous inputs can over-ride the synchronous inputs and force the outputs to either LOW or HIGH.

Looking at the truth table, the counter should run from 0000 to 1001 and roll over to 0000 again. Since the counter has to display 1001, the next binary value 1010 will be used to reset the counter to 0. For a JK FF, we have an asynchronous input called CLEAR, when you assert this, flop's output goes to 0. Since this CLEAR input is active high, we can use AND gate. The 2 FFs where a '1' occurs will be tied to an AND gate and the output will be tied to a CLEAR input. When the counter goes to 1001, the AND gate has a value '1' on its output and will activate the CLEAR inputs of all FFs.

Thursday, April 17, 2008

DAG

Suppose we wanted to add an OAI gate, OAI(A, B,C) = not [(A + B) ⋅C], to the target gate library for our synthesizer. If the synthesizer uses the Directed Acyclic Graph (DAG) covering method for mapping equations to gates, what would be the appropriate primitive DAG (i.e., an equivalent circuit which uses only inverters and 2-input NANDs) for this new gate?

Counter design using FSM

Design a 3-bit counter that counts in binary or in Gray code, depending on the values of a “mode control input”. This synchronous 3-bit counter has a mode control input m. As long as m = 0, the counter steps through the binary sequence 000, 001, 010, 011, 100, 101, 110, 111 and repeats this sequence. As long as m = 1, the counter advances through the Gray code sequence 000, 001, 011, 010, 110, 111, 101, 100 and repeats this sequence. However, it is possible that the mode input changes at any time in any of these sequences.

For example, assume that the mode input is 0 for the first two clock cycles but changes to 1 in the third clock cycle and stays at 1 in the fourth clock cycle. (That is, m goes through the sequence 0, 0, 1, 1.) Then, the output will be 000, 001, 010, 110, 111. In this example, 000 is the initial state. The first two state transitions (000->001->010) occurs in the binary counting mode. Then, because m = 1 from the third clock cycle onwards, the state goes from 010->110->->111, as indicated in the Gray code sequence.

In addition to the mode control input, there is a reset input. This synchronous counter should go to the 000 state if asserted.

FSM Problem

Construct a synchronous moore state machine with two inputs, A and B, and two outputs, X and Y. The machine accepts data on two input lines synchronously with the clock. The output X is 1 if and only if the data on the two input lines have been identical (i.e. A and B are both 1, or A and B are both 0) for the last three or more consecutive clock cycles. Output Y is 1 if and only if the data on the two input lines have been complements of each other (i.e. A = 1 and B = 0, or A = 0 and B = 1) for the last three or more consecutive clock cycles.

Friday, April 11, 2008

Clock Tree Synthesis

Now-a-days, designing clock-distribution networks for high-speed chips is more complex than just meeting timing specifications. Achieving clock latency and clock skew are difficult when you have clock signals of 300 MHz or more transversing the chip. Because the clock network is one of the most power-hungry nets on a chip, you need to design with power dissipation in mind.

The basics of CTS is to develop the interconnect that connects the system clock into all the cells in the chip that uses the clock. For CTS, your major concerns are,

Minimizing the clock skew
Optimizing clock buffers to meet skew specifications and
Minimize clock-tree power dissipation

The primary job of CTS tools is to vary routing paths, placement of the clocked cells and clock buffers to meet maximum skew specifications.

For a balanced tree without buffers (before CTS), the clock line's capacitance increases exponentially as you move from the clocked element to the primary clock input. The extra capacitance results from the wider metal needed to carry current to the branching segments. The extra metal also results in additional chip area to accommodate the extra clock-line width. Adding buffers at the branching points of the tree significantly lowers clock-interconnect capacitance, because you can reduce clock-line width toward the root.

When designing a clock tree, you need to consider performance specifications that are timing-related. Clock-tree timing specifications include clock latency, skew, and jitter. Non-timing specifications include power dissipation, signal integrity. Many clock-design issues affect multiple performance parameters; for example, adding clock buffers to balance clock lines and decrease skew may result in additional clock-tree power dissipation.

The biggest problem we face in designing clock trees is skew minimization. The factors that contribute to clock skew include loading mismatch at the clocked elements, mismatch in RC delay.

Clock skew adds to cycle times, reducing the clock rate at which a chip can operate. Typically, skew should be 10% or less of a chip's clock cycle, meaning that for a 100-MHz clock, skew must be 1 nsec or less. High-performance designs may require skew to be 5% of the clock cycle.

Clock design methodology

Many chip companies have comprehensive clock-network- design strategies that they use on their customers' chips. Motorola uses the Clock Generator tool along with Cadence place-and-route tools. This tool combination produces a tree with minimum insertion delay, a minimum number of buffers, and maximum fan-out. Typical skew is less than 300 psec. After generation of the clock tree, the output from the place-and-route tool is flat, meaning that the design hierarchy is lost.

Effect of CTS

Lots of clock buffers are added
Congestion may increase
Non-clock tree cells may have been moved to non-ideal locations
Can introduce new timing violations

Glossary

Balanced clock tree : The delays from the root of the clock tree to leaves are almost same.

Clock distribution: The main task of clock distribution is to distribute the clock signal across the chip in order to minimize the clock skew.

Clock buffer: To keep equal rise and fall delays of the clock signal.

Global skew: Difference in clock timing paths b/w any combination of two FFs in the design within the same clock domain.

Local skew : Balances the skew only b/w related FF pairs. FFs are related only when one FF launches date which is captured by the other.

Tuesday, April 8, 2008

State Diagram

Draw the state diagram and state table for the circuit which produces the output '1' when the aggregate serial binary input is divisible by 5.

Depth of the Asynchronous FIFO

One of the most interesting architectural decision in the design project is how to calculate the depth of a FIFO. FIFO is an intermediate logic where the data would be buffered or stored . Smaller FIFO depth can cause overflow scenario and cause a data loss.

For worst case scenario, difference in the data rate between write and read should be maximum. Hence, for write operation maximum data rate should be considered and for read operation minimum data rate should be considered for calculating the depth of the FIFO.

Any Asynchronous FIFO has a write frequency and a read frequency. Assume that the write frequency (Fw) is faster than read frequency (Fr).

Scenario 1:

Fw = 1/Tw and Fr = 1/Tr where Tw and Tr are Time periods of write and read respectively.

Now Transmitter (Write side) wants to transmit "W" words of data. But FIFO can take only "N" words of data in Tw time.
Time taken to transmit "W" words is (Tw/N) * W

But Receiver can read "P" words in Tr time interval.
So the Receiver can read ((Tw/N)*W*P)/Tr words in (Tw/N) * W time.

Subtract the the data read from FIFO to the data written into the FIFO.
Here the data written into the FIFO is "W" words
Data read from the FIFO is ((Tw/N)*W*P)/Tr words.

FIFO size = W-((Tw/N)*W*P)/Tr

Where

W = Maximum number of bytes that the transmitter can send
N = Number of bytes that the transmitter sends per Tw
Tw = Transmitter's time period
P = Number of bytes that receiver receives per Tr
Tr = Receiver's time period

Scenario 2:

Consider the case of a FIFO where the 'Fw' is 100 MHz and 50 words are written into the FIFO in 100 clocks while the 'Fr' is 50 MHz and one word is read out every clock.

In the worst case scenario, the 50 words are written into the FIFO as a burst in 500 ns. In the same time duration, the read side can read only 25 words out of the FIFO. The remaining 25 words are read out of the FIFO in the 50 idle write clocks. So the depth of the FIFO should be at least 28. (Three clock cycles are for synchronizer latency).

Tuesday, April 1, 2008

Functional Coverage vs Code Coverage

Basic definition of Functional coverage

Functional coverage is the determination of how much functionality of the design has been exercised by the verification environment. Let us explain this with a simple example:

If your manager told you to prove that the set of regression tests created for a particular design was exercising all the functionality defined in the specification. How would you go about proving?

You would show the list of tests in the regression suite and correlation of those tests to the functionality defined in the specification.

You would need to prove that the test executed the functionality, it is supposed to check.

Finally, you would create a list showing each function and check off those that were exercised.

From this list you would extract a metric showing number of functions exercised divided by total number of functions to be checked.

This is probably what you would present to your manager. This is functional coverage. The difficulty is that it is a too much of manual process; Today's design requires more structured approach.

There are two magical questions that every design team ask and answer.

Is my chip functioning properly?
Am I done verifying the chip?

Proper execution of each test in a test suite is measure of functional coverage. Each test is created to check the particular functionality of a specification. Therefore, it is natural we assume that if it were proven that each test is completed properly, then the entire set of functionality is verified. This assumes that each test has been verified to exercise the functionality for which it was created. In many cases this verification is performed manually. This type of manual checking is both time consuming and error prone. There appears to be a confusion in the industry what constitutes a functional coverage.

Code coverage

This will give information about how many lines are executed, how many times expressions, branches executed. This coverage is collected by the simulation tools. Users use this coverage to reach those corner cases which are not hit by the random test cases. Users have to write the directed test cases to reach the missing code coverage areas.

Both of them have equal importance in the verification. 100% functional coverage does not mean that the DUT is completely exercised and vice-versa. Verification engineers will consider both coverages to measure the verification progress.

I would like to explain this difference with a simple example. Let's say, the specification talks about 3 features, A, B, and C. And let's say that the RTL designed coded only feature A and B.If the test exercises only feature A and B, then you can 100% code coverage. Thus, even if you have 100% code coverage, you have a big hole (feature C) in the design.So, the verification engineer, has to write functional coverage code for A, B and C and 100% functional coverage means, there are tests for all the features, which the verification engineer has thought of.

The role of coverage in verification environment

Both functional and code coverage are complementary to each other, meaning that 100% functional coverage doesn't imply 100% code coverage; 100% code coverage - still has to achieve functional coverage goals.

Identify the coverage holes

One of the most important of functional verification is to identify the coverage holes in the coverage space The goal for any successful verification is to achieve the specified target goals with least amount of simulation cycle.

Limitation of functional coverage

There is not a defined list of 100% functionality of the design is and therefore, there may be a missing functionality in the list.

There is no real way to check that the coverage model is correct, manual check is the only way.

ASIC Verification