**( Placement Example with Known Optimal wirelength )**

Director : Prof. Jason Cong

Authors : Chin-Chih Chang and Min Xie

Copyright© 2002-2003* the Regents of University of California*

*Motivation*

Placement is an important step in the overall IC design process in DSM technologies, as it defines the on-chip interconnects, which have become the bottleneck in determining circuit performance. The development in placement algorithms is slow but steady. Various algorithms have been proposed in the past 30 years,including min-cut methods, iterative methods, and analytical methods. However, in terms of wirelength reduction, the rate of improvement is only 5-10% every 2-3 years since the 1980s. This prompts us to think whether there remains any room for improvement for circuit placement (at least in terms of wirelength minimization).

According to ITRS¨01, the maximum transistor number per chip will be over 1.6 billion, with a frequency of 28.7 GHz by the year 2016. Such design complexity will pose significant challenge to existing placement algorithms. There is a need to quantify the optimality and scalability of state-of-the-art placement engines.

However, most existing works only compared with known heuristics; they did not tell how close the solutions were to the optimal. The study in [1] attempted to quantify the suboptimality of placement algorithms in terms of chip area by ＾stitching￣ small designs to form large designs. Although the circuits in [1] did not have known optimal solutions, upper bounds of the optimal solutions were derived and compared with the solutions given by several VLSI heuristics.

*Circuit Description*

The PEKO suite is developed at UCLA VLSI CAD LAB. The placement examples are constructed such that their optimal wirelengths are known. The module numbers and netlist profiles are extracted from ISPD98 [2]. To mimic real designs and to make the legalization process easier, we inserted 15% white space into the examples by expanding one dimension of the chip. A detailed description about the algorithm to generate these examples is given in [5]. We generated four suites of PEKO examples, named from Suite1 to Suite4. The examples are given in both GSRC BookShelf and LEF/DEF format and can be downloaded here.

When generating Suite1, we remove all the connections to pads with the concern that such connections may give hint about where to place each module. Suite2 are generated by scaling the profiles by a factor of 10. Table 1 gives the description of Suite1.

Table 1. Characteristics of Suite1

circuit |
#cell |
#net |
#row |
row utilization |
optimal wirelength |

Peko01 |
12506 |
13865 |
113 |
85% |
8.14E+05 |

Peko02 |
19342 |
19325 |
140 |
85% |
1.26E+06 |

Peko03 |
22853 |
27118 |
152 |
85% |
1.50E+06 |

Peko04 |
27220 |
31683 |
166 |
85% |
1.75E+06 |

Peko05 |
28146 |
27777 |
169 |
85% |
1.91E+06 |

Peko06 |
32332 |
34660 |
181 |
85% |
2.06E+06 |

Peko07 |
45639 |
47830 |
215 |
85% |
2.88E+06 |

Peko08 |
51023 |
50227 |
227 |
85% |
3.14E+06 |

Peko09 |
53110 |
60617 |
231 |
85% |
3.64E+06 |

Peko10 |
68685 |
74452 |
263 |
85% |
4.73E+06 |

Peko11 |
70152 |
81048 |
266 |
85% |
4.71E+06 |

Peko12 |
70439 |
76603 |
266 |
85% |
5.00E+06 |

Peko13 |
83709 |
99176 |
290 |
85% |
5.87E+06 |

Peko14 |
147088 |
152255 |
385 |
85% |
9.01E+06 |

Peko15 |
161187 |
186225 |
402 |
85% |
1.15E+07 |

Peko16 |
182980 |
189544 |
429 |
85% |
1.25E+07 |

Peko17 |
184752 |
188838 |
431 |
85% |
1.34E+07 |

Peko18 |
210341 |
201648 |
460 |
85% |
1.32E+07 |

Using similar methods, we generated two suites that allow connections to pads and name them Suite3 and Suite4 respectively. Suite4 is generated by scaling the profiles by a factor of 10. Fixed modules are used in these examples to emulated pads. Table 2 gives the description of Suite3.

Table 2. Characteristics of Suite3

circuit |
#cell |
#net |
#row |
Row utilization |
optimal wirelength |

Peko01 |
12506 |
14111 |
113 |
85% |
8.22E+05 |

Peko02 |
19342 |
19584 |
140 |
85% |
1.27E+06 |

Peko03 |
22853 |
27401 |
152 |
85% |
1.51E+06 |

Peko04 |
27220 |
31970 |
166 |
85% |
1.76E+06 |

Peko05 |
28146 |
28446 |
169 |
85% |
1.95E+06 |

Peko06 |
32332 |
34826 |
181 |
85% |
2.07E+06 |

Peko07 |
45639 |
48117 |
215 |
85% |
2.89E+06 |

Peko08 |
51023 |
50513 |
227 |
85% |
3.15E+06 |

Peko09 |
53110 |
60902 |
231 |
85% |
3.65E+06 |

Peko10 |
68685 |
75196 |
263 |
85% |
4.75E+06 |

Peko11 |
70152 |
81454 |
266 |
85% |
4.72E+06 |

Peko12 |
70439 |
77240 |
266 |
85% |
5.02E+06 |

Peko13 |
83709 |
99666 |
290 |
85% |
5.89E+06 |

Peko14 |
147088 |
152772 |
385 |
85% |
9.03E+06 |

Peko15 |
161187 |
186608 |
402 |
85% |
1.16E+07 |

Peko16 |
182980 |
190048 |
429 |
85% |
1.25E+07 |

Peko17 |
184750 |
189581 |
431 |
85% |
1.35E+07 |

Peko18 |
210341 |
201920 |
460 |
85% |
1.32E+07 |

*Experimental Result*

We experimented with four state-of-the-art placers, including:

， Dragon. Dragon is based on a multilevel framework. It uses hMetis to derive an initial partition result on the circuits, then undergoes a series of refinement stages doing bin-based swapping with simulated annealing [3]. We used Dragon v.2.20.

， Capo. Capo is built on a multilevel partitioner. It aims to enhance the routability with such techniques as tolerance computation and block splitting heuristics [4]. We used Capo v.8.0.

， mPL. mPL is also based on a multilevel framework.It uses non-linear programming to handle the non-overlapping constraints on the coarsest level, and uses Goto based relaxation in subsequent refinement stages. We used mPL v.1.2. Compared with mPL v.1.0, mPL v.1.2 uses an additional V cycle and adopts distance based clustering in the second V cycle [5].

， QPlace. QPlace is the placement engine used in Silicon Ensemble v.5.3 of Cadence Inc. The version we used is QPlace v.5.1.55.

We set an upper limit of 24 hours to the runtime of a placer. Processes running over 24 hours will be terminated. The experiments are performed on the following two machines:

， Sun Blade workstation 750 MHz running SunOs 5.8 with 4GB of memory. Experiments with Dragon and mPL are performed on this machine.

， Pentium IV 1.4 GHz running Windows 2000 with 2GB of memory. Experiment with Capo is performed on this machine.

We were given the latest version of Capo v.8.5 for Linux and experimented on a Pentium IV 2.2GHz running Redhat v.8.0. The results are included in the two spreadsheets below.

Fig. 1. and Fig. 2. gives the experimental results. Here are the spreadsheets of the experimental data with Suite1 and Suite2.

Figure
1. Solution Quality vs Cell Number

Figure
2. Runtime vs Cell Number

The experimental results
indicate that the wirelengths produced by these tools are 1.66 to 2.53 times
the optimal in the worst cases, and are 1.46 to 2.38 times the
optimal on the average. As for scalability, the average solution
quality of each tool shows deterioration by
an additional 4% to 25% when the problem size increases by a factor of 10.

**Our experiment is not intended
to compare the four placers. Rather, it is for a gap analysis to see how much
room remains for improvement in placement algorithms. Please do not interpret
the results here as a ranking of the placers. **

**PEKO has obvious limitations:
all the nets in the optimal solutions are local nets, i.e., they only connect
cells in contiguous areas, which may not be true on real circuits. Therefore,
achieving good results on PEKO does not guarantee good performance on real
circuits. Also, it is not encouraged to tune the placer in order to beat these
examples.**

*Reference*

2. C. J.
Alpert, "The ISPD98 Circuit Benchmark Suite," *Proc. International
Symposium on Physical Design*, pp. 85-90, 1998.

3. M.
Wang, X. Yang and M. Sarrafzadeh, "Dragon2000: Standard-cell Placement
Tool for Large Industry Circuits," *Proc.* *International
Conference on Computer-Aided Design*, pp. 260-264, 2000.

6.
C. C. Chang, J. Cong, M. Xie, "Optimality and Scalability
Study of Existing Placement Algorithms," To appear in *Proc.* *Asia
South-Pacific Design Automation Conference,* 2003.

Please direct your questions to xie@cs.ucla.edu. Thanks.