ASIC Verification: 2008

Monday, September 29, 2008

Writing test cases using sequences

The simplest way to write a test case is to redefine the behavior of the MAIN sequence by overriding the body() TCM. This is sufficient to create the test case, because the MAIN sequence is started automatically.

extend MAIN usb_sequence_s {
body()@driver.clock is only {

do usb_transaction_s keeping {
.pid == SETUP};
}; // End of do
}; // End of body()
}; // End of MAIN

To create a simple test based on the sequence library is to choose a subset that will be activated in the specific test and then set the weights for each kind using keep soft select.


Tuesday, September 23, 2008

Sequence Implementation

How to implement the various scenarios using the sequence struct?
  • The items and sub-sequences are created inside the pre-defined TCM called "body()" - used to define the behavior of the sequences - using the dedicated "do" action.
  • The body() TCM inside the MAIN sequence is launched automatically by the run() method inside the sequence driver.
  • The body() TCM of any sub-sequence is activated by the "do" action.
  • The body() TCM defines the duration of the sequences.
  • When "do"ing an item, you must emit the event driver.item_done to let the sequence complete the "do" action and inform the driver that the item was processed. Without emitting this event, the sequence cannot continue, and the driver cannot drive more items.
  • The do action can only be activated inside sequences.
How to create the sequence library?

Once you define the sequence struct, you can create various scnearios by creating the sub types of the sequences using kind field.

1. Extend the sequence kind type with the new kind

extend usb_transaction_kind_t : [ SETUP];

2. Extend the new sequence sub type of the kind with either parameters or body().

// SETUP Transaction
extend SETUP usb_sequence_s {
body()@driver.clock is {
do sequence_items keeping {
// The "do sequence_items" will wait until the driver.get_next_item is called and will be
// completed once the driver. packet_done is emitted. Hence the "do sequence_items" and
// driver.get_next_item work in tandem
.pid == SETUP;
}; // End of body()
}; // End of extend

// OUT Transaction
extend usb_transaction_kind_t : [ OUT];
extend OUT usb_sequence_s {
body()@driver.clock is {
do sequence_items keeping {
.pid == OUT;
}; // End of body()
}; // End of extend

// IN Transaction
extend usb_transaction_kind_t : [ IN];
extend IN usb_sequence_s {
body()@driver.clock is {
do sequence_items keeping {
.pid == IN;
}; // End of body()
}; // End of extend

In the next post, we will see how to write the test cases using sequences.

Monday, September 22, 2008

Sequences in Specman

What are the sequences in Specman? What are the significance of sequences? This post explains the way of implementing the sequences in specman.

Introduction: Sequences let you define the streams of data items sent over to the input protocol of the DUT. Before moving on to the details of sequences, let me give you some of the important definitions.

Item : It is a struct that represents the main inputs to the DUT - Basically a USB packet or an CPU instruction.

Sequence : It is a struct that represents the streams of data items generated one after another according to some protocol, say for example, in USB - Out or setup packet is transferred followed by Data packet.

Sequence driver : The SD acts as an agent b/w sequences and the Bus Functional Model (BFM). Basically, a SD takes the generated data items and pass it on to the BFM which in turns convert all your high level data items (Packets or Instruction) into low level data items (bits and bytes).

Basically a TCM which resides in the BFM does the actual transmission of data items. Both the sequence driver and BFM acts as a pair; The sequence driver serves as the interface upward towards the sequnces and the BFM serves as the interface downward towards DUT. Therefore the Sequence driver only interacts with BFM for the purpose of driving data item into the DUT.

How to use sequences?

1. Define the sequence item struct - For an item to be used with sequences, it must have some common functionality.

type usb_pid_t : [SETUP = 0xD, OUT = 0x1 , IN = 0x9];

// Create the sequence item
struct usb_transaction_s like any_sequence_item {
dev_addr : uint (bits : 7);
ep_num : uint (bits:4);
pid : usb_pid_t;
data : list of uint (bits : 8);
};

2. Define the sequence and its driver using the sequence statement

// Define the sequence
sequence usb_sequence_s using

item = usb_transaction_s,
created_driver : usb_transaction_driver_u; // Name of the driver
created_kind : usb_transaction_kind_t; // Various sub types for the sequence

3. Hook up the sequence driver to the environment

Have a BFM that knows how to drive the USB packet to the DUT.

unit usb_bfm {

event usb_clk is rise ('usb_clk_30') @ sim;
drive_packet ( packet : usb_transaction_s) @ usb_clk is {
-------
------
}; // End of drive_packet TCM
}; // End of unit usb_bfm

The BFM is instantiated in the Agent

extend usb_bfm {
driver : usb_transaction_driver_u; // Reference to the sequence driver in the BFM
};

unit usb_agent {
bfm : usb_bfm is instance; };

extend usb_agent {
driver : usb_transaction_driver_u is instance;
keep bfm.driver == driver;
};

connect the pre-defined clock to the BFM clock

extend usb_bfm {
on usb_clk { emit driver.clock }; };

Pull item from driver, process it and inform through done
Instantiate the driver into the Agent

extend usb_bfm {
execute_items () @ clock is {

var sequential_items : usb_transaction_s;
while (TRUE) {
sequential_items = driver.get_next_item();
drive_packet (sequential_item);
emit driver.packet_done;
}; // End of while loop
}; // End of execute_item TCM

run() is also {
start execute_items ();
}; // End of run method
}; // End of usb_bfm

In the next post, we will see how to implemet the sequence library and how to write the test case s using the sequences.


Saturday, September 20, 2008

Increase the operational speed of the circuit

There are numerous design techniques exists to increase the speed of the digital circuit. These design techniques are implemented automatically during synthesis. It also involves the trade-off between the area and speed, means that you have to pay for more area in order to achieve a speed.
  1. Use And-Or-Invert or Or-And-Invert gates wherever possible, since they are economical particularly for both area and speed.
  2. Feed the late arriving signals in your design late into the combinatorial circuit to balance the total gate delay along each path of the combinatorial circuit. To know more about late arriving signals, please go thorough it here.
  3. Use a maximum of 2 inputs on all combinatorial circuits in your design. For example, you can use two numbers of 2 input NAND gates and a 2 input NOR gate instead of using a 4 input AND gate.
  4. The bottom line is, if a boolean function with more than 2 inputs are decomposed into several simple gates, which results in more gates for the same function; but the total delay is reduced. According to the gate delay model, an N-input AND gate contains a branch with N transistors in series, resulting in an increased internal resistance of N*delta. Furthermore, the parasitic capacitance is also increased; therefore the internal delay of an N input AND gate is N(square)*delta. So if the NAND gate with 6 inputs is not decomposed, the internal delay will be rougly 0.7 ns as opposed to 0.42 ns with decomposition.
  5. Use Johnson counters instead of binary counters.
  6. An n-stage johnson counter produces a set of outputs of length 2n, which can be decoded to give a count sequence. The advantage of using this counter is that, having no combinatorial logic between flip-flops, it can be run at the maximum speed permitted by setup and hold time constraints. The disadvantage of a johnson counter is that, for a required count of m, it requires m/2 flip-flops, rather than log2(m) as required by a synchronous binary counter.


Thursday, September 4, 2008

Verilog questions

1. Is this a valid, synthesizable, use of a for loop?

module for_loop();

reg [8:0] A, B;
integer i;
parameter N=8;
always@(B)
begin
for (i=1; i<=N; i=i+1)
A[i-1]=B[i];
A[N] = A[N-1];
end
endmodule

2. Assuming the code above is synthesizable, Which of the following continuous assignment statements would have the closest meaning?

A. assign A = B << 1;
B. assign A = B <<< 1;
C. assign A = B >> 1;
D. assign A = B >>> 1;

3. If the following logic is built exactly as described, which test vector sensitizes a stuck-at-0 fault at "e" and propagates it to the output "g".

module (a, b, c, d, e, f, g);
input a, b, c, d;
output e, f, g;

assign e = a & b;
assign f = c ^ e;
assign g = d | f;

endmodule

A. {a, b, c, d} = 4’b0010;
B. {a, b, c, d} = 4’b1100;
C. {a, b, c, d} = 4’b1111;
D. {a, b, c, d} = 4’b0101;
E. None of the above

4. Consider the following two test fixtures.

// Fixture A
parameter delay1 =
parameter delay2 =
initial
begin
B = 1’b0;
#20 A = 1’b1;
#delay1 A = 1’b0;
#delay2 B = 1'b1;
end

// Fixture B
initial
fork
B = 1’b0;
#20 A = 1’b1;
#40 A = 1’b0;
#60 B = 1’b1;
join

For these two fixtures to produce the same waveforms, delay1 and delay2 have to be set
as follows:

A. delay1 = 40; delay2 = 60;
B. delay1 = 30; delay2 = 20;
C. delay1 = 30; delay2 = 30;
D. delay1 = 20; delay2 = 20;
E. None of these are correct

5. In verification, most of the effort should be applied at the system (complete chip) level. Which of the following statements gives the best reason as to why?

A. Most of the bugs in a design are in the netlist wiring it together.
B. Most of the bugs in a design are due to poorly understood interactions between different modules.
C. This is the fastest way to verify the individual modules that make up the design.
D. Most of the bugs in a design occur because of poorly designed interfaces, e.g. buses.
E. None of the above are remotely a good reason.

6. Consider the following specify block:

specify
specparam A0spec = 1 : 2 : 3;
specparam A1spec = 2 : 3 : 4;
(a => b) = (A0spec, A1spec);
endspecify

This is defining the following:

A. Rising, falling and steady delay from input a to output b of 1, 2, and 3 ns respectively when a is 0, and 2, 3, and 4 ns when a is 1.
B. Minimum, typical and maximum delay from input a to output b of 1, 2, and 3 ns on a rising edge at B, and 2, 3 and 4 ns on a falling edge.
C. Non-blocking assignment of a to b with minimum, typical and maximum delay of 1, 2 and 3ns.
D. Setup time requirements for the flip-flop with output B.



Sunday, August 24, 2008

Power consumption

Here are some tips to predict the power consumption early in the design cycle i.e at the RTL Level.

Determine your design components power consumption

Find which components power consumption in your design is fixed by the specification and which components will be affected through power reduction techniques. For instance, input output power can be fixed at the specification level and memory power can also be fixed in the specification level. But memory can be powered down when not in use. If large amount of power is consumed by the clock in your design, then you need to have a clock gating techniques.

The designers can do an RTL power analysis before the design is synthesized. This analysis can't be as accurate as Gate level synthesis. But it gives an overall idea of potential power saving. For example, how much power could be saved if block "A" could be powered down 65 percent of the time, or if block "A" operated at 0.8 volt rather than 1 V?

Accurate switching activity from your simulation

To ensure an accurate power estimate, you need to use the most accurate data you have available at any given point in the design flow and revise your estimate as new data becomes available. But getting accurate switching activity is a huge challenge.

If switching activity data is not available from simulation, designers should estimate the switching activity on the chip's primary inputs and apply that estimate within the power analysis tool. Most power analysis tools can propagate the switching activity data through both the combinatorial and sequential logic.

Thursday, July 24, 2008

Low power design

In today's chip industry, power consumption is the primary concern of the hardware designers who implement wireless and mobile applications. Though, EDA tools are emerging, many design decisions - which are influencing the power consumptions - are made at the system and architectural level prior to writing RTL. The power consumption of a circuit can be classified as either dynamic or static. Dynamic power consumption is a function of switched capacitance and supply voltages. Static power consumption comprises the circuit-activity-independent leakage power consumption. Leakage power depends on process technology parameters such as threshold voltages, supply voltages, circuit state, and temperature.

Methodology

Cells that do not perform a required function are turned off using sleep transistors. But instead of disabling just the clock signal, sleep transistors also disconnect cells from their power supply. Therefore, power gating reduces both dynamic and static power consumption. Power gating can be implemented in two different ways: fine or coarse grain.

FINE GRAIN power gating requires that each cell come with its own sleep transistor.


Advantage
  • Good timing control
Disadvantages
  • Increased area overhead
  • Less leakage control
  • Required a standard cell library with sleep transistor
COARSE-GRAIN power gating methodology is implemented using special sleep transistor cells. One sleep transistor cell is used to turn on and off a set of standard cells. The coarse-grained approach requires less area than fine-grain power gating due to the lower number of sleep transistors and less routing of enable signals for power gating. Fewer sleep transistors result in better leakage control.

Unlike fine grained power gating, when the power is switched off in coarse grain power gating, the power is disconnected from all the registers resulting in loss of data. If the data is to be preserved when the power is disconnected, then we need to store the data somewhere, where there is no power gating - This is done by a special register called "Retention" register. The key advantage of retention register is that they are simple to use and are very quick to save and restore the state.

State Retention Power Gating

This technique allows the voltage supply is reduced to zero for the majority of the SOC block's logic gates while maintaining the voltage supply for the state element of that block. The state of the SOC is always saved in the sequential components. Using the SRPG technique, when in the inactive mode, power to the combination logic is turned off and the sequential stays on thereby reducing the power consumption greatly when the application is in stop mode.

Wednesday, July 23, 2008

Gray Code Counter Implementation

A Gray code is an encoding of numbers so that adjacent numbers have a single digit differing by 1. The term Gray code is often used to refer to a Binary Reflected Gray Code. We can implement a gray code counter in a different ways. Consider the following table carefully.

B : 000, 001, 010, 011, 100, 101, 110, 111
G: 000, 001, 011, 010, 110, 111, 101, 100

To convert a binary number d1,d2,..,d(n-1),dn to its corresponding Binary Reflected Gray Code, start at the right with the digit dn (the LSB). If the d(n-1) is 1, replace dn by (1-dn); otherwise, leave it unchanged. Then proceed to d(n-1). Continue up to the first d1, which is kept the same. The resulting number g1,g2,..,g(n-1),gn is the Reflected Binary Gray Code.

The most common Gray code is where the lower half of the sequence is exactly the mirror image of first half with only the MSB inverted. We illustrate the 3-bit binary Gray code as an example.

Binary to gray code can be achieved by

gray[2] = binary[2];

gray[1] = binary[2] ^ binary[1];

gray[0] = binary[1] ^ binary[0];

A simple verilog code to implement this function is given by

assign gray = (binary>> 1) ^ binary; // Right shift by 1 and EX-OR with binary.

 module gray_cntr (  
clock_in,
rst_n,
enable_in,
cnt_out
);


// I/O Declarations

input clock_in, rst_n, enable_in;
output [ 2:0] cnt_out;
wire [2:0] cnt_out;
reg [2:0] cnt;

always @ (posedge clock_in or negedge rst_n)
if (!rst_n)
cnt
<= 1'b0;
else if (enable_in)
cnt
<= cnt + 1'b1;

assign cnt_out = { cnt[2], (^cnt[2:1]), (^cnt[1:0]) };

endmodule

Saturday, July 19, 2008

Mixed Signal Modeling

Designers today find themselves adding more and more analog and mixed-signal content to their creations. In the past, designers used different verification methodologies to verify designs that contains analog circuits. At the very highest levels of abstraction, system designers used Matlab to model systems that would be implemented with analog circuits; Designers now started using Verilog AMS ( Analog Mixed Signal ) - which allows the designer to model the analog circuit with different level of abstractions. The AMS extensions to Verilog is a good idea, particularly for SOC design. But so far they have received limited use, because they are relatively new and require learning new syntax and semantics and the acquisition of new simulation tools.

Here are some of the simple examples that shows how to write the behavioral model for analog circuits.

RESISTORS

One of the simplest models that can be described by Verilog-A is a resistor. In general, a resistor is a relationship between voltage and current, as in f(V, I) = 0 where V represents the voltage across the resistor, I represents the current through the resistor, and f is an arbitrary function of two arguments.

The equation for a simple linear resistor is V = IR where R is the resistance.

`include “disciplines.vams”

It defines the names electrical, V, and I, which are used in the model. It also defines other disciplines and natures.
module res(p,n);
inout p,n; // Positive and Negative terminals
electrical p,n;
The p and n ports are defined to be electrical, meaning the signals associated with the ports are expected to be voltage and current.

parameter real r=0 from [0:inf]; // R value is from 0 to infinity

analog
V(p,n) <+ r*I(p,n);
The analog keyword introduces an analog process. An analog process is used to describe continuous time behavior. Syntactically, it is the analog keyword followed by a statement that describes the relationship between signals. This relationship must be true at all times.
endmodule

TRIANGLE WAVE FORM GENERATION

module V_triangle_generator(out);
output out;
voltage out;

parameter real period = 10n from [0:inf],
ampl = 1;

integer slope;
real offset;

analog
begin
@(timer(0,period))
begin
slope = +1;
offset = $realtime;
discontinuity(1);
end

@(timer(period/2,period))
begin
slope = -1;
offset = $realtime;
discontinuity(1);
end

V(out) <+ ampl * slope * (4*($realtime-offset)/period - 1);
end
endmodule

Note that you can't compile this code in modelsim simulator tool. Synopsys' Discovery AMS, a mixed-signal simulator, allows designers to create entire designs with Accellera's Verilog-AMS language, launch all simulations from a single integrated control environment, and efficiently use parasitic data for post-layout analysis.

Verilog-AMS, a language standard approved by the Accellera EDA standards body, describes the behavior of analog and mixed-signal designs. The language is made up of three key parts: Verilog-D for digital designs, Verilog-A for analog, and mixed-signal extensions to specify domain-shifting algorithms.

You can download the Verilog-AMS language reference manual from here.



Monday, July 14, 2008

Serial to Parallel Data Conversion

A serial to parallel data conversion requires n-bit shift register. Therefore, a serial-in/parallel-out shift register converts data from serial format to parallel format. If four data bits are shifted in by four clock pulses via a single wire at serial-in, the data becomes available simultaneously on the four outputs parallel_out[3] to parallel_out[0] after the fourth clock pulse.

A serial to parallel data conversion circuit is used for converting a serial word supplied by some domain "X" to a parallel word so as to allow for the processing of the parallel word by a processor. The "X" domain supplies to the interface circuit a 'ready' pulse signal. The interface circuit, in response to the 'ready' pulse signal, supplies an 'ack' pulse and a 'clock' signal to the "X", so as to allow the serial word from the "X" to be transferred to the interface circuit, which then converts the serial word to a parallel word. An enable pulse signal supplied to the interface circuit effects the transfer of the parallel word from the interface circuit to the processor.

module
serial_2_parallel (

clk_in,
rst_n,
ready_in,
shift_enable,
serial_in,
ack_out,
parallel_out );

// I/O declarations

input clk_in;
input rst_n;
input ready_in;
input shift_enable;
input serial_in;

output [3:0] parallel_out;
reg [3:0] parallel_out;
output ack_out;
reg ack_out;

wire [3:0] parallel_wire;

// A 4-bit shift register to convert serial to parallel

always@(posedge clk_in or negedge rst_n)
begin
if(rst_n == 1'b0)
begin
parallel_out <= 4'b0;
ack_out <= 1'b0; // ack_out is initially 0
end
// Shift enable is driven from tb as 1 when ack_out is 1
else if (shift_enable == 1'b1 && ready_in == 1'b1)
parallel_out <= ({serial_in, parallel_wire[3:1]});
else
begin
parallel_out <= parallel_wire;
ack_out <= 1'b1;
end
end

// Declare a 4-bit wire

assign parallel_wire = parallel_out;

endmodule

Coverage driven Random Verification

Coverage-driven random verification methods are becoming recognized as one of the best ways to verify complex IC designs. Cadence Design Systems, announced that new technologies have been integrated into the Cadence® Incisive® Enterprise verification family that enable engineering teams to address increasingly complex chip design. Incisive technologies now offer support for the newly developed Open Verification Methodology (OVM), a new aspect-oriented generation engine, and the second generation of Cadence transaction-based acceleration (TBA) with native support of multiple test-bench languages and numerous productivity enhancements.

To understand how to take advantage of this solution, cdn (cadence designer network) talked to Mr. Apurva Kalia, VP of R&D.

You can read his interview here.

Monday, June 16, 2008

RTL Design techniques - Coding style

My focus has always been on what i’s good for synthesis with little regard to the effect on simulation speed.

Create a block level diagram before begin your coding
Draw a simple block diagrams of the functions of your design. This will also helpful in documentation. Use these block diagrams while code your design.

Always think of a fresher who read your RTL
Start with the inputs to your design - on the left side of block diagram - and describe the design’'s functionality from inputs to outputs. Don'’t try to be an ultra-efficient RTL coder. Please don’t forget to put comments. Have a comment “header” for each module, comment the functionality of each I/O, and use comments throughout the design to explain the “tricky” parts.

Hierarchy
At the top level of your chip there should be 4 or 5 blocks: I/O pads, clock generator, reset circuit, and the core design. They are in separate blocks, because they might not be all synthesizable. Isolating them simplifies synthesis. Typically, the core design is hierarchical and organized by function.

Use separate always@ blocks for sequential logic and combinatorial logic
  1. It helps organize your RTL description
  2. There is a sequential optimization process in DC, which uses your coding style description of the sequential element to map it to the best sequential element in your technology library. When you combine sequential and combinatorial logic descriptions together, the tool can get confused and might not recognize the type of sequential element you are describing.
Use blocking for combinational and non-blocking for sequential
There is one good paper by Stuart Sutherland about the blocking and non-blocking assignments. This paper can be downloaded from here.

Know whether you have prioritized or parallel conditions
If the conditions are mutually exclusive, then a case statement is better, because it is easier to read and it organizes the parallel states of the description. If multiple conditions can occur at the same time, use the “if” statement and prioritize the conditions using “else if” for each subsequent condition.

Completely specify all branches of all conditional statements
If you completely specify all possible combinations of ones and zero’s for the different cases and you use the same select operator for all cases – DC will automatically recognize that case statement is fully specified and parallel.

Initialize output of conditional statements prior to defining the statements
Be careful selecting what value you initialize the output to. If there is'’t a default state for that part of the design – then try to pick the “most popular” state to initialize the output to – that should help reduce extra switching (power) during operation.

Use high level constructs (case, if, always@) as much as possible
Synthesis works best with high level RTL constructs. Low level gates or Boolean level constructs (verilog primitives) constrain DC.

Using good coding style and writing “safe” RTL code is not enough! Understand what you are implying and figure out in advance where are the potential problems. You should be able to manually synthesize in your head what you have described in your RTL description.

Sunday, June 15, 2008

RTL Design techniques - Pre-RTL Checklist.

Your success in IC design is directly depends on your RTL code. There is a lot more that goes into a good RTL description than just writing with good coding style. Design for Test and Design for Synthesis are just a few examples of design goals that can be affected at the RTL. This post is all about RTL design issues. Code it correctly from the beginning and you won'’t need so many big fancy tools to solve your timing closure problems at the back end of the design cycle.

There are many design issues - which impact the speed and area of the design - need to be resolved before you begin coding your design.

Communicate design issues with your team - Things to be worked out as a team
  • Naming convention for hierarchical blocks,
  • Naming convention for signals,
  • Active low or active high states for the signal
Does the specification define how the design should be partitioned?
Partitioning helps to break down your big design into smaller blocks and assign each small unit to different members of the team. Follow the specification's recommendation for partitioning.

What are the I/O requirements?
At the major functional block level, define the interface protocol as soon as possible. What bus interface protocol will be used? PCI, AHB or OCP. Get the specification for each bus and interface to the design before you begin coding. Make sure the function and timing of each one is clear. This will also enable you to create high level models of your design before you start coding the RTL.

What about the clocks in the design?
How many clocks will be required for the design? Where are the clocks for the chip coming from? Will they be internally generated? PLL? Divide by circuits? Externally supplied clocks? You have to isolate your clock generation circuitry from the rest of the chip design. Especially if it is analog based.

What other IPs are you using?
Does the design require any extra IP (Intellectual Property) to be integrated into it? RAMs? Cores? Buses? FIFOs? Then start with the interface to each IP block and define it.

Is it your expectation that you are pin-limited or gate limited?
Being pin-limited means that you don’t have enough I/O pads in your ASIC package to do what you really want to do. You might be able to double up on the functions of each pin, which would require multiplexing signals and would prevent any ideas of a unidirectional bus interface at the I/O pad level. But if you need all the signals to be active simultaneously, you won’t be able to do it either. You'’ll have to split the design up. You should know before you begin your RTL.

Being gate-limited means that the design has too much functionality for the die size chosen. You might have to cut out functionality to fit on the die. Or you can try to optimize your design for area, which means speed objectives might be tough to meet. It is hard to estimate whether you will be gate limited at the beginning of a project unless you have been through this design before.

Is it your expectation that you will be pushing the speed envelope of the technology?
  • How much functionality are you putting into your design
  • At what speed will it be running?
  • What technology are you going to use to implement it?
  • Has it ever been done before?
  • What changes to the design are you willing to make to achieve the speed goal for your design? Pipelining or Register re-timing.
In the next post, I'm going to post the rules that tend to cause the most common errors.

Wednesday, June 11, 2008

Fact about Johnson counter

The Johnson counter is made of a simple shift register with an inverted feedback. That is if the complement output of a ring counter is fed back to the input instead of the true output, a Johnson counter results. Figure shows the 4-bit Johnson counter with 2*4=8 states.

Johnson counter have 2n states where "n" is the number of flip-flops. But normal binary counters have (2 power n) states for the same number of flops. Here the interesting news about the Johnson counter is unused states. The formula used to find out the unused states is (2 power n) - (2n). Here, in this case, the number of unused states are (2 power 4) - (2*4) = 8 states.

Tuesday, June 10, 2008

What is Pipelining?

Let me assume that I am going to build a car and I have all of the parts are lying around at my hand. Let us further assume that the main steps in the process are as follows:
  • Attach the wheel to the car,
  • Attach the engine,
  • Attach the seats,
  • Attach the body and
  • Paint everything.
Now let us assume that I require a specialist to perform each of these tasks. My Five friends are playing cricket. My first friend comes and attaches a wheel and goes back to play cricket. Assume that he takes 10 minutes. On his return, My second friend comes and attaches the engine and goes back once his job is done and so on. Once the first car has been completed, they start all over again. Obviously, this is very in-efficient scenarios as the whole process takes 50 minutes. Furthermore, for each of those 10 minutes, only one man is working. It would be much more efficient to have 5 cars in the same place. In this case, as soon as my first friend completed his work, he can go to the second car to attach a wheel while the second friend attaches the engine in first car. In this scenario, everyone will be working at all time.

PIPELINING

Let us assume that we have a design that can be implemented as a series of blocks of combinational logics. Let's assume that we have a chunk of 3 combinational blocks. Let us say each block takes "t" nanoseconds to perform its task. In this case, it will take "3t" nanoseconds for a word of data to propagate through the function, starting with its arrival at the inputs to the first block and ends with its departure at the output pin of 3rd block.

We wouldn't want to present a new data to the inputs until we have stored the output results associated with the first word of data. The answer is to use a pipelined design technique in which the block of combinational logic are sandwiched b/w block of registers.

All of the register banks are driven by a common clock. On each active clock edge, the registers feeding a block of logic are loaded with the result from the previous stage. These values then propagate through that block of logic until they arrive at its outputs, at which point they are read to be loaded to the next set of register on the next clock. In this case, as soon as the pipeline is fully loaded, a new word of data can be processed every "t" nanoseconds.

Monday, June 9, 2008

Re-Timing

Re-timing is based on the concept of balancing out the positive and negative slacks throughout the design. In this context, the positive slack means, the amount of time by which the conditions are met and negative slack means, the amount of time by which the condition is not met.

For example, let us assume a pipe-lined design, whose frequency is such that the maximum register to register delay is 15ps. Now, let us assume that we have a situation as shown in Figure.


The longest timing path in the first block of combinational logic is 10ps - positive slack of 5ps.
The longest timing path in the second block of combination logic is 20ps - negative slack of 5ps. Once the initial path timing is calculated, combinational logic is moved across the register boundaries to steal from paths with a negative slack to donate to the paths with a postive slack.

Saturday, June 7, 2008

Ten Tips to Improve an IC Design Teams Project Execution

Many IC design projects are frequently struggling with unexpected delays to their production plan. Intense time to market pressures demands an environment that prescribes improvement to both the predictability and length of design time lines. In consideration of this continuous improvement, aspiration Jorvig Consulting has compiled ten vital tips for consideration as teams pursue their quest to improve IC design execution.

This press release can be viewed from here.

Saturday, May 31, 2008

VHDL vs Verilog

What is the reason that Verilog is usually considered better at low level modeling than VHDL? Why is VHDL usually considered better than Verilog for high level modeling?

Verilog has built-in types for gates and transistors, can also handle true bidirectional signals (VHDL has none of these things).

VHDL allows users to define their own data types which allows users to extend the language. Also, support for libraries and packages lends itself to more complex models.

Wednesday, May 28, 2008

Parity Detector

N number of EX-NOR gates is connected in series such that the N inputs (A0, A1, A2......) are given in the following way:

A0 and A1 is being given to first EX-NOR gate and A2 and output of first EX-NOR is being given to second EX-NOR gate and so on. Nth EX-NOR gate's output is final output. How does this circuit work? Explain in detail?

Solution:

If N= Odd ( # of EX-NOR gates are Odd), the circuit acts as even parity detector, i.e. the output will 1 if there are even number of 1's in the inputs. This could also be called as odd parity generator since with this additional 1 as output the total number of 1's will be Odd.

If N= Even ( # of EX-NOR gates are Even), the circuit acts as odd parity detector, i.e. the output will 1 if there are odd number of 1's in the inputs. This could also be called as even parity generator since with this additional 1 as output the total number of 1's will be Even.

Monday, May 26, 2008

Clock Dividers

Dividing a clock by an even number always generates 50% duty cycle output. Sometimes it is necessary to generate a 50% duty cycle frequency even when the input clock is divided by an odd or non-integer number. In this post I am going to talk about how to divide a clock by an odd number.

The easiest way to create an odd divider with a 50% duty cycle is to generate two clocks at half the desired output frequency with a quadrature-phase relationship (constant 90° phase difference between the two clocks). You can then generate the output frequency by exclusive-ORing the two waveforms together. Because of the constant 90° phase offset, only one transition occurs at a time on the input of the exclusive-OR gate, effectively eliminating any glitches on the output waveform.

Let’s see how it works by taking an example where the REF_CLK is divided by 3.
  • Create a counter which is incremented on every rising edge of the input clock (REF_CLK) and the counter is reset to ZERO when the terminal count of counter reaches to (N-1). where N is odd number (3, 5, 7 and so on)
  • Take two toggle flip-flops and generate their enables as follows; T-FF1 is enabled when the counter reaches '0' and T-FF2 is enabled when the counter reaches (N/2)+1.
  • Output of T-FF1 is triggered on rising edge of REF_CLK and output of T-FF2 is triggered on the falling edge of REF_CLK.
  • The divide by N clock is derived by simply Ex-ORing both the output of T-FFs.

The above Figure shows the timing diagram for the above steps.


Tuesday, May 20, 2008

Asynchronous and Synchronous Reset

ASYNCHRONOUS RESET

A fully asynchronous reset is one that both asserts and de-asserts a flip-flop asynchronously. Here, asynchronous reset refers to the situation where the reset net is tied to the asynchronous reset pin of the flip-flop. Additionally, the reset assertion and de-assertion is performed without any knowledge of the clock. This type of reset is very common but is very dangerous if the module boundary represents the FPGA boundary.

The biggest problem with the asynchronous reset circuit described above is that, it will work most of the time. However, if the edge of the reset deassertion is too close to the clock edge and violate the reset recovery time, then the output of FF goes to metastable. The reset recovery time is a type of setup timing condition on a flip-flop that defines the minimum amount of time between the de-assertion of reset and the next rising clock edge as shown in Figure.


It is important to note that reset recovery time violations only occur on the de-assertion of reset and not the assertion. Therefore, fully asynchronous resets are not recommended.

SYNCHRONOUS RESET

The most obvious solution to the problem introduced in the preceding section is to fully synchronize the reset signal as you would any asynchronous signal.

The advantage to this type of topology is that the reset presented to all functional flip-flops is fully synchronous to the clock and will always meet the reset recovery time. The interesting thing about this reset topology is actually not the deassertion of reset for recovery time but rather the assertion In the previous section, it was noted that the assertion of reset is not of interest, but that is true only for asynchronous resets and not necessarily with synchronous resets. Consider the scenario illustrated in Figure.
Consider the scenario where the clock is running sufficiently slow, the reset is not captured due to the absence of a rising clock edge during the assertion of the reset signal. The result is that the flip-flops within this domain are never reset.

Fully synchronous resets may fail to capture the reset signal itself (failure of assertion) depending on the nature of the clock.

For this reason, fully synchronous resets are not recommended unless the capture of the reset signal (reset assertion) can be guaranteed by design.

Asynchronous Assertion, Synchronous De-assertion

A third approach that captures the best of both techniques is a method that asserts all resets asynchronously but de-asserts them synchronously.
In Figure, the registers in the reset circuit are asynchronously reset via the external signal, and all functional registers are reset at the same time. This occurs asynchronous with the clock, which does not need to be running at the time of the reset. When the external reset de-asserts, the clock local to that domain must toggle twice before the functional registers are taken out of reset. Note that the functional registers are taken out of reset only when the clock begins to toggle and is done so synchronously.

A reset circuit that asserts asynchronously and de-asserts synchronously generally provides a more reliable reset than fully synchronous or fully asynchronous resets.


The code for this synchronizer is shown below.

module reset_sync(
output reg rst_sync,
input clk, rst_n);
reg R1;
always @(posedge clk or negedge rst_n)
if(!rst_n)
begin

R1 <= 1'b0;
rst_sync <= 1'b0;
end

else
begin

R1 <= 1'b1;
rst_sync <= R1;
end
endmodule

Saturday, May 17, 2008

Developing Test Plan

What is the difference between Test Specification and a Test Plan?

Test Specification – A detailed summary of what scenarios will be tested, how they will be tested, how often they will be tested and so on. Example of a given feature include, if the USB2.0 hub receives the token packet followed by a data packet with a payload of 64 bytes for the bulk endpoint, it has to give the acknowledge back the host controller within the specified time as per USB2.0 specification.

Test Plan - A collection of all test specifications for a given area. The Test Plan contains a high-level overview of what is tested and what is tested by others for the given feature area.

If you ask a tester on another team what is the difference between the two, you might receive different answers. In addition, I use the terms interchangeably all the time at work, so if you see me using the term test plan, think of test specification.

Parts of a Test Specification


A Test Specification should consists of the following parts:


  • History / Revision - Who created the test spec? Who were the developers and Program Managers (Usability Engineers, Documentation Writers, etc) at the time when the test specification was created? When was it created? When was the last time it was updated? What were the major changes at the time of the last update?
  • Feature Description – A brief description of what area is being tested.
  • What is tested? – A quick overview of what scenarios are tested, so people looking through this specification know that they are at the correct place.
  • What is not tested? - Are there any areas being covered by different people or different test specs? If so, include a pointer to these test specifications.
  • "Nightly" Test Cases – A list of the test cases and high-level description of what is tested each night or whenever a new release becomes available.
  • Breakout of Major Test Areas - This section is the most interesting part of the test specification where test plan writers arrange test cases according to what they are testing.

Setting Test Case Priority


A Test Specification may have a couple of hundred test cases, depending on how the test cases were defined, how large the feature area is, and so forth. It is important to be able to query for the most important test cases (nightly), the next most important test cases (weekly), the next most important test cases (full test pass) and so forth. A sample prioritization for test cases may look like:


  • Highest priority (Nightly) – Must run whenever a new release is available
  • Second highest priority (Weekly) – Other major functionality tests run once every three or four release
  • Lower priority – Run once every major coding milestone