ASIC Verification: June 2008

Monday, June 16, 2008

RTL Design techniques - Coding style

My focus has always been on what i’s good for synthesis with little regard to the effect on simulation speed.

Create a block level diagram before begin your coding
Draw a simple block diagrams of the functions of your design. This will also helpful in documentation. Use these block diagrams while code your design.

Always think of a fresher who read your RTL

Start with the inputs to your design - on the left side of block diagram - and describe the design’'s functionality from inputs to outputs. Don'’t try to be an ultra-efficient RTL coder. Please don’t forget to put comments. Have a comment “header” for each module, comment the functionality of each I/O, and use comments throughout the design to explain the “tricky” parts.

Hierarchy
At the top level of your chip there should be 4 or 5 blocks: I/O pads, clock generator, reset circuit, and the core design. They are in separate blocks, because they might not be all synthesizable. Isolating them simplifies synthesis. Typically, the core design is hierarchical and organized by function.

Use separate always@ blocks for sequential logic and combinatorial logic

–It helps organize your RTL description
There is a sequential optimization process in DC, which uses your coding style description of the sequential element to map it to the best sequential element in your technology library. When you combine sequential and combinatorial logic descriptions together, the tool can get confused and might not recognize the type of sequential element you are describing.

Use blocking for combinational and non-blocking for sequential
There is one good paper by Stuart Sutherland about the blocking and non-blocking assignments. This paper can be downloaded from here.

Know whether you have prioritized or parallel conditions
If the conditions are mutually exclusive, then a case statement is better, because it is easier to read and it organizes the parallel states of the description. If multiple conditions can occur at the same time, use the “if” statement and prioritize the conditions using “else if” for each subsequent condition.

Completely specify all branches of all conditional statements
If you completely specify all possible combinations of ones and zero’s for the different cases and you use the same select operator for all cases – DC will automatically recognize that case statement is fully specified and parallel.

Initialize output of conditional statements prior to defining the statements
Be careful selecting what value you initialize the output to. If there is'’t a default state for that part of the design – then try to pick the “most popular” state to initialize the output to – that should help reduce extra switching (power) during operation.

Use high level constructs (case, if, always@) as much as possible
Synthesis works best with high level RTL constructs. Low level gates or Boolean level constructs (verilog primitives) constrain DC.

Using good coding style and writing “safe” RTL code is not enough! Understand what you are implying and figure out in advance where are the potential problems. You should be able to manually synthesize in your head what you have described in your RTL description.

Sunday, June 15, 2008

RTL Design techniques - Pre-RTL Checklist.

Your success in IC design is directly depends on your RTL code. There is a lot more that goes into a good RTL description than just writing with good coding style. Design for Test and Design for Synthesis are just a few examples of design goals that can be affected at the RTL. This post is all about RTL design issues. Code it correctly from the beginning and you won'’t need so many big fancy tools to solve your timing closure problems at the back end of the design cycle.

There are many design issues - which impact the speed and area of the design - need to be resolved before you begin coding your design.

Communicate design issues with your team - Things to be worked out as a team

Naming convention for hierarchical blocks,
Naming convention for signals,
Active low or active high states for the signal

Does the specification define how the design should be partitioned?
Partitioning helps to break down your big design into smaller blocks and assign each small unit to different members of the team. Follow the specification's recommendation for partitioning.

What are the I/O requirements?

At the major functional block level, define the interface protocol as soon as possible. What bus interface protocol will be used? PCI, AHB or OCP. Get the specification for each bus and interface to the design before you begin coding. Make sure the function and timing of each one is clear. This will also enable you to create high level models of your design before you start coding the RTL.

What about the clocks in the design?
How many clocks will be required for the design? Where are the clocks for the chip coming from? Will they be internally generated? PLL? Divide by circuits? Externally supplied clocks? You have to isolate your clock generation circuitry from the rest of the chip design. Especially if it is analog based.

What other IPs are you using?
Does the design require any extra IP (Intellectual Property) to be integrated into it? RAMs? Cores? Buses? FIFOs? Then start with the interface to each IP block and define it.

Is it your expectation that you are pin-limited or gate limited?
Being pin-limited means that you don’t have enough I/O pads in your ASIC package to do what you really want to do. You might be able to double up on the functions of each pin, which would require multiplexing signals and would prevent any ideas of a unidirectional bus interface at the I/O pad level. But if you need all the signals to be active simultaneously, you won’t be able to do it either. You'’ll have to split the design up. You should know before you begin your RTL.

Being gate-limited means that the design has too much functionality for the die size chosen. You might have to cut out functionality to fit on the die. Or you can try to optimize your design for area, which means speed objectives might be tough to meet. It is hard to estimate whether you will be gate limited at the beginning of a project unless you have been through this design before.

Is it your expectation that you will be pushing the speed envelope of the technology?

How much functionality are you putting into your design
At what speed will it be running?
What technology are you going to use to implement it?
Has it ever been done before?
What changes to the design are you willing to make to achieve the speed goal for your design? Pipelining or Register re-timing.

In the next post, I'm going to post the rules that tend to cause the most common errors.

Wednesday, June 11, 2008

Fact about Johnson counter

The Johnson counter is made of a simple shift register with an inverted feedback. That is if the complement output of a ring counter is fed back to the input instead of the true output, a Johnson counter results. Figure shows the 4-bit Johnson counter with 2*4=8 states.

Johnson counter have 2n states where "n" is the number of flip-flops. But normal binary counters have (2 power n) states for the same number of flops. Here the interesting news about the Johnson counter is unused states. The formula used to find out the unused states is (2 power n) - (2n). Here, in this case, the number of unused states are (2 power 4) - (2*4) = 8 states.

Tuesday, June 10, 2008

What is Pipelining?

Let me assume that I am going to build a car and I have all of the parts are lying around at my hand. Let us further assume that the main steps in the process are as follows:

Attach the wheel to the car,
Attach the engine,
Attach the seats,
Attach the body and
Paint everything.

Now let us assume that I require a specialist to perform each of these tasks. My Five friends are playing cricket. My first friend comes and attaches a wheel and goes back to play cricket. Assume that he takes 10 minutes. On his return, My second friend comes and attaches the engine and goes back once his job is done and so on. Once the first car has been completed, they start all over again. Obviously, this is very in-efficient scenarios as the whole process takes 50 minutes. Furthermore, for each of those 10 minutes, only one man is working. It would be much more efficient to have 5 cars in the same place. In this case, as soon as my first friend completed his work, he can go to the second car to attach a wheel while the second friend attaches the engine in first car. In this scenario, everyone will be working at all time.

PIPELINING

Let us assume that we have a design that can be implemented as a series of blocks of combinational logics. Let's assume that we have a chunk of 3 combinational blocks. Let us say each block takes "t" nanoseconds to perform its task. In this case, it will take "3t" nanoseconds for a word of data to propagate through the function, starting with its arrival at the inputs to the first block and ends with its departure at the output pin of 3rd block.

We wouldn't want to present a new data to the inputs until we have stored the output results associated with the first word of data. The answer is to use a pipelined design technique in which the block of combinational logic are sandwiched b/w block of registers.

All of the register banks are driven by a common clock. On each active clock edge, the registers feeding a block of logic are loaded with the result from the previous stage. These values then propagate through that block of logic until they arrive at its outputs, at which point they are read to be loaded to the next set of register on the next clock. In this case, as soon as the pipeline is fully loaded, a new word of data can be processed every "t" nanoseconds.

Monday, June 9, 2008

Re-Timing

Re-timing is based on the concept of balancing out the positive and negative slacks throughout the design. In this context, the positive slack means, the amount of time by which the conditions are met and negative slack means, the amount of time by which the condition is not met.

For example, let us assume a pipe-lined design, whose frequency is such that the maximum register to register delay is 15ps. Now, let us assume that we have a situation as shown in Figure.

The longest timing path in the first block of combinational logic is 10ps - positive slack of 5ps.

The longest timing path in the second block of combination logic is 20ps - negative slack of 5ps. Once the initial path timing is calculated, combinational logic is moved across the register boundaries to steal from paths with a negative slack to donate to the paths with a postive slack.

Saturday, June 7, 2008

Ten Tips to Improve an IC Design Teams Project Execution

Many IC design projects are frequently struggling with unexpected delays to their production plan. Intense time to market pressures demands an environment that prescribes improvement to both the predictability and length of design time lines. In consideration of this continuous improvement, aspiration Jorvig Consulting has compiled ten vital tips for consideration as teams pursue their quest to improve IC design execution.

This press release can be viewed from here.

ASIC Verification