(Timingdriven Placement Example with Known
Optimal delay)
Director : Prof.
Jason Cong
Authors : Michail
Romesis and Min Xie
Copyright© 20022003 the Regents
of University of California
Motivation
The placement problem has been studied
extensively in the past 30 years. However, a recent study of wirelengthdriven
placement [1] shows that existing placement solutions are surprisingly far from
optimal. Using a set of constructed placement examples that match many
industrial circuit characteristics with known optimal wirelength (PEKO), the study shows
that the results of leading placement tools from both industry and academia are
70% to 150% away from the optimal solutions on those examples. An extension of
PEKO was presented in [2], where new examples called PEKU (Placement
Examples with Known Upper bounds) were created by inserting a certain
percentage of nonlocal nets into a PEKO circuit. By relaxing the optimality
constraint on a subset of connections, PEKU more accurately emulates real
circuits in terms of wirelength distribution. Experiments showed that for PEKU
benchmarks, stateoftheart placers can be far away from the upper bound. In
the extreme case, where each circuit consists of global connections only (GPEKU benchmarks),
existing tools can be 41% to 102% away in the worst case. These studies
generated great interest in both academia and industry.
However, wirelength is not the sole objective
in circuit placement. In the era of DSM technology, an important goal of
placement is performance (delay) optimization. There is a strong need to extend
the optimality study to timingdriven placement algorithms.
Circuit Description
The TPEKO suite is developed at UCLA VLSI CAD LAB. A total of 20
synthetic circuits are constructed from MCNC benchmarks such that their optimal
delays are known under a simplified linear delay model, i.e., the delay of each
gate is a constant d_{g},_{ }the delay between two gates
A and B is d_{i}(A,B) = dist(A,
B)*d_{u}.. Here, dist(A,B) is the
Manhattan distance between A and B, d_{u} is a
constant. TPEKO is given in the format specified in [3] for FPGA placement. We
chose this format for its flexibility to specify our delay model and cell
library. Table 1 gives the characteristic of TPEKO suite. TPEKO can be
downloaded here.
Table 1.
Characteristics of the TPEKO suite. Column "Orig" gives the initial
circuit from which the synthetic circuits are derived. M is the number
of longest paths in each synthetic circuit. M = 0 corresponds to the
original circuit. Column "Opt" gives the optimal delay for each
circuit. It is the same for circuits derived from the same initial placement
for M ³ 1.
Ckt 
Orig 
Opt 
M
= 0 
M
= 1 
M
= 3 
M
= 5 

CLB 
PI 
PO 
FF 
NET 
CLB 
PI 
PO 
FF 
NET 
CLB 
PI 
PO 
FF 
NET 
CLB 
PI 
PO 
FF 
NET 

TPeko01 
tseng 
88 
1047 
51 
122 
385 
1098 
1056 
51 
105 
371 
1107 
1059 
51 
85 
341 
1128 
1060 
51 
81 
326 
1162 
TPeko02 
ex5p 
108 
1064 
8 
63 
0 
1072 
1067 
8 
57 
5 
1075 
1068 
8 
56 
6 
1102 
1068 
8 
50 
32 
1127 
TPeko03 
apex4 
106 
1261 
9 
19 
0 
1271 
1275 
9 
22 
25 
1284 
1287 
9 
17 
24 
1310 
1287 
9 
18 
38 
1347 
TPeko04 
dsip 
116 
1370 
228 
197 
224 
1598 
1424 
228 
192 
226 
1652 
1537 
228 
189 
228 
1789 
1653 
228 
189 
230 
1932 
TPeko05 
misex3 
92 
1397 
14 
14 
0 
1411 
1415 
14 
34 
40 
1429 
1427 
14 
15 
19 
1461 
1431 
14 
11 
23 
1496 
TPeko06 
diffeq 
84 
1497 
63 
39 
377 
1560 
1502 
63 
26 
366 
1565 
1511 
63 
19 
354 
1596 
1512 
63 
18 
343 
1626 
TPeko07 
alu4 
100 
1522 
14 
8 
0 
1536 
1535 
14 
8 
2 
1549 
1546 
14 
8 
6 
1579 
1561 
14 
8 
11 
1626 
TPeko08 
des 
132 
1591 
256 
245 
0 
1847 
1604 
255 
206 
10 
1859 
1686 
254 
194 
16 
1971 
1772 
254 
191 
17 
2077 
TPeko09 
bigkey 
124 
1707 
228 
197 
224 
1935 
1766 
228 
197 
226 
1994 
1793 
224 
188 
239 
2036 
1801 
223 
166 
227 
2075 
TPeko10 
seq 
122 
1750 
41 
35 
0 
1791 
1754 
41 
40 
18 
1795 
1756 
41 
38 
27 
1814 
1756 
41 
28 
24 
1848 
TPeko11 
apex2 
128 
1878 
38 
3 
0 
1916 
1894 
38 
8 
13 
1932 
1912 
38 
5 
17 
1977 
1913 
38 
8 
25 
2002 
TPeko12 
s298 
166 
1931 
3 
6 
8 
1934 
1934 
3 
6 
11 
1937 
1934 
3 
6 
15 
1986 
1934 
3 
6 
42 
1988 
TPeko13 
frisc 
170 
3556 
19 
116 
886 
3575 
3568 
19 
113 
896 
3587 
3576 
19 
109 
893 
3628 
3578 
19 
105 
876 
3648 
TPeko14 
elliptic 
146 
3604 
130 
114 
1122 
3734 
3616 
130 
81 
1099 
3746 
3628 
130 
68 
1054 
3789 
3638 
130 
63 
1027 
3819 
TPeko15 
spla 
184 
3690 
16 
46 
0 
3706 
3698 
16 
50 
14 
3714 
3706 
16 
47 
25 
3773 
3706 
16 
53 
35 
3773 
TPeko16 
pdc 
240 
4575 
16 
40 
0 
4591 
4595 
16 
55 
30 
4611 
4607 
16 
48 
35 
4667 
4607 
16 
39 
27 
4674 
TPeko17 
ex1010 
290 
4598 
10 
10 
0 
4608 
4610 
10 
21 
22 
4620 
4612 
10 
10 
14 
4673 
4612 
10 
10 
21 
4673 
TPeko18 
S38417 
164 
6406 
28 
106 
1463 
6434 
6417 
28 
96 
1477 
6445 
6434 
28 
94 
1485 
6501 
6441 
28 
88 
1435 
6520 
TPeko19 
S38584.1 
164 
6435 
37 
304 
1260 
6484 
6457 
37 
262 
1250 
6494 
6476 
37 
236 
1236 
6552 
6487 
37 
233 
1211 
6575 
TPeko20 
clma 
328 
8382 
61 
82 
33 
8444 
8393 
60 
60 
128 
8453 
8402 
60 
57 
117 
8513 
8408 
60 
53 
120 
8519 
We
also constructed 17 synthetic examples for Xilinx Virtex architecture in XDL
format [7]. These circuits are grouped as TPEKOX and can be downloaded here.
Table 2 gives the characteristic of these examples.
Table 2.
Characteristics of the TPEKOX suite. The column "Orig" gives the
MCNC benchmark from which each synthetic circuit is derived. Depending on the
circuit size, the chip and packaging we chose for each circuit is different.
Circuit 
Orig 
chip/package 
IOB 
Slice 
Net 

TPeko01x 
ex5p 
xcv50/bg256 
73 
573 
1130 

TPeko02x 
tseng 
xcv200/fg456 
176 
582 
1182 

TPeko03x 
apex4 
xcv50/bg256 
33 
669 
1302 

TPeko04x 
misex3 
xcv50/bg256 
31 
760 
1415 

TPeko05x 
alu4 
xcv50/bg256 
27 
766 
1537 

Tpeko06x 
diffeq 
Xcv50/bg256 
110 
766 
1563 

TPeko07x 
dsip 
xcv600/fg680 
435 
832 
1690 

TPeko08x 
des 
xcv600/fg680 
504 
863 
1932 

TPeko09x 
seq 
xcv100/bg256 
79 
941 
1832 

TPeko10x 
apex2 
xcv100/bg256 
49 
968 
1940 

TPeko11x 
s298 
xcv100/bg256 
16 
1199 
1964 

TPeko12x 
frisc 
xcv200/fg456 
152 
1824 
3562 

TPeko13x 
spla 
xcv200/fg456 
62 
1961 
3731 

TPeko14x 
elliptic 
xcv200/fg456 
248 
1985 
3766 

TPeko15x 
pdc 
xcv200/fg456 
59 
2350 
4593 

TPeko16x 
ex1010 
xcv200/fg456 
29 
2350 
4614 

TPeko17x 
clma 
xcv600/fg680 
130 
4704 
8463 
Finally,
we constructed 16 synthetic examples for the Altera Stratix architecture [9]. These circuits are grouped
as the TPEKOA and can be downloaded here.
Table 3 gives the characteristic of these examples.
Table 3. Characteristics of the TPEKOA
suite. The column "Orig" gives the MCNC benchmark from which each
synthetic circuit is derived.
Circuit 
Orig 
Chip 
IOs 
LEs 
Nets 

TPeko01a 
apex4 
EP1S10B672C6 
31 
1503 
1514 

TPeko02a 
misex3 
EP1S10B672C6 
30 
1628 
1643 

TPeko03a 
diffeq 
EP1S10B672C6 
104 
1663 
1727 

TPeko04a 
alu4 
EP1S10B672C6 
24 
1753 
1768 

TPeko05a 
seq 
EP1S10B672C6 
78 
1986 
2028 

Tpeko06a 
apex2 
EP1S10B672C6 
43 
2144 
2183 

TPeko07a 
s298 
EP1S10B672C6 
11 
2277 
2281 

TPeko08a 
frisc 
EP1S10B672C6 
137 
3837 
3857 

TPeko09a 
elliptic 
EP1S10B672C6 
246 
3790 
3921 

TPeko10a 
spla 
EP1S10B672C6 
64 
3091 
4008 

TPeko11a 
pdc 
EP1S10B672C6 
58 
4906 
4923 

TPeko12a 
ex1010 
EP1S10B672C6 
22 
4914 
4925 

TPeko13a 
s38417 
EP1S10B672C6 
136 
6577 
6606 

TPeko14a 
tseng 
EP1S10B672C6 
175 
1198 
1250 

TPeko15a 
ex5p 
EP1S10B672C6 
73 
1350 
1359 

TPeko16a 
clma 
EP1S10B672C6 
146 
8639 
8702 
Experimental Result
We experimented with two
stateoftheart FPGA placers on TPEKO, including:
· VPR. VPR
is a well known FPGA placement and routing package widely used for FPGA
architecture evaluation [3,5]. Its optimization engine is based on simulated
annealing. It combines connectionbased and pathbased timinganalysis. The
cost function it uses trades off between wirelength and critical path delay. We
used VPR v.4.3 downloaded from [4] in our experiment.
· PATH. PATH
is the latest FPGA placement algorithm which presents a significant enhancement
to VPR in timing optimization [6]. It takes into consideration the path sharing
effect. PATH introduces a new net weighting algorithm based on the concept of
pathcounting. We used PATH v.1.0 in our experiment.
We modified the delay
computation in each algorithm, so that the interconnect delay is always linear
with regard to the Manhattan distance. This change, in effect, makes our study of
these algorithms independent of the FPGA architecture and their routing
procedure. In our experiment, the tradeoff parameter of wirelength vs. delay is
0.5, as suggested by [5]. Changing the value of this parameter to favor the
critical path delay minimization did not seem to improve the final results.
Table 4. Experimental results on TPEKO
for VPR and PATH. The average difference between each algorithm's result and
the optimal solution is listed. For completeness, the best results for every
circuit are also included.
Circuit 
M
= 1 
M
= 3 
M
= 5 

VPR 
PATH 
VPR 
PATH 
VPR 
PATH 

Avg 
Best 
Avg 
Best 
Avg 
Best 
Avg 
Best 
Avg 
Best 
Avg 
Best 

TPeko01 
4% 
0% 
7% 
0% 
21% 
17% 
17% 
15% 
35% 
27% 
25% 
20% 
TPeko02 
6% 
1% 
4% 
1% 
23% 
15% 
17% 
13% 
38% 
31% 
16% 
14% 
TPeko03 
12% 
4% 
6% 
0% 
30% 
23% 
12% 
8% 
27% 
24% 
15% 
10% 
TPeko04 
7% 
0% 
0% 
0% 
10% 
7% 
5% 
3% 
18% 
14% 
15% 
9% 
TPeko05 
22% 
7% 
10% 
0% 
34% 
25% 
15% 
5% 
34% 
24% 
16% 
11% 
TPeko06 
10% 
5% 
5% 
1% 
26% 
20% 
19% 
8% 
34% 
20% 
17% 
14% 
TPeko07 
22% 
10% 
4% 
0% 
25% 
20% 
11% 
7% 
33% 
27% 
17% 
13% 
TPeko08 
53% 
44% 
18% 
7% 
52% 
47% 
30% 
24% 
40% 
20% 
33% 
27% 
TPeko09 
9% 
2% 
0% 
0% 
19% 
14% 
24% 
15% 
29% 
28% 
21% 
18% 
TPeko10 
11% 
6% 
7% 
1% 
27% 
25% 
18% 
12% 
33% 
29% 
15% 
13% 
TPeko11 
15% 
10% 
8% 
0% 
26% 
19% 
10% 
6% 
40% 
33% 
15% 
12% 
TPeko12 
11% 
4% 
4% 
1% 
40% 
16% 
13% 
11% 
39% 
15% 
16% 
14% 
TPeko13 
31% 
25% 
15% 
8% 
39% 
22% 
35% 
19% 
35% 
29% 
32% 
22% 
TPeko14 
12% 
8% 
2% 
1% 
26% 
21% 
21% 
12% 
32% 
26% 
31% 
25% 
TPeko15 
15% 
11% 
7% 
1% 
25% 
22% 
17% 
10% 
36% 
33% 
23% 
18% 
TPeko16 
17% 
7% 
13% 
3% 
61% 
55% 
14% 
12% 
36% 
25% 
19% 
13% 
TPeko17 
24% 
17% 
10% 
2% 
24% 
15% 
17% 
12% 
30% 
21% 
26% 
21% 
TPeko18 
32% 
15% 
19% 
5% 
33% 
15% 
30% 
20% 
29% 
22% 
30% 
16% 
TPeko19 
17% 
13% 
34% 
26% 
46% 
25% 
45% 
38% 
48% 
41% 
41% 
37% 
TPeko20 
32% 
25% 
24% 
14% 
49% 
37% 
25% 
19% 
47% 
30% 
38% 
31% 
Avg. 
18% 
11% 
10% 
4% 
32% 
23% 
20% 
13% 
35% 
26% 
23% 
18% 
Table
4. gives the experimental results. The average
difference between each algorithm's result and the optimal solution is listed.
For completeness, the best results for every circuit are also reported. From
the results, we can make the following observations:
· For M = 1,
the delay produced by the algorithms is from 10% to 18% longer than the optima
of TPEKO on average, and from 34% to 53% longer in the worst case.
· The
solution quality of both algorithms deteriorates as M increases. For M = 5, the
gap between their solutions and the optima is from 23% to 35% on average, and
from 41% to 48% in the worst case.
We experimented with Xilinx placement and
routing tool, PAR on TPEKOX. The version we experimented is Release 5.1.03i
PAR F.26. First, we let PAR do placement without routing and compared the delay
with that of the constructed solutions. Then we let PAR do placement followed
by routing. In the latter case, we used PAR to do routing on our constructed
solutions and quoted the delay reported by its timing analysis tool. The delay
on our constructed solutions served as upper bounds to the optimal delay for
the synthetic circuits. To guarantee that PAR can find the minimum possible
delay in this experiment, we set a loose delay constraint at the beginning and
gradually tighten it until PAR can no longer find a solution satisfying this
constraint. Table 5 gives the experimental results on these circuits.
Table 5. Experimental results on TPEKOX
with PAR. The first few columns give the circuit characteristics. The upper
bound of the optimal delay by our construction is given in the column
"UB", the result by PAR is given in the column "PAR". The
delays are given in nanoseconds.
Circuit 
w/o routing 
w routing 

UB (ns) 
PAR (ns) 
diff 
UB (ns) 
PAR (ns) 
diff 

TPeko01x 
70.0 
75.2 
7.5% 
76.8 
80.4 
4.7% 
TPeko02x 
78.2 
80.3 
2.7% 
90.6 
90.5 
0.1% 
TPeko03x 
62.7 
65.8 
4.9% 
64.7 
67.5 
4.3% 
TPeko04x 
55.9 
62.5 
11.9% 
54.7 
56.9 
4.1% 
TPeko05x 
52.2 
57.4 
10.0% 
49.0 
53.5 
9.2% 
Tpeko06x 
59.2 
61.2 
3.5% 
55.9 
58.3 
4.2% 
TPeko07x 
78.2 
83.5 
6.7% 
90.7 
95.0 
4.7% 
TPeko08x 
78.2 
83.0 
6.1% 
90.2 
91.1 
1.1% 
TPeko09x 
64.6 
68.1 
5.5% 
68.3 
69.9 
2.4% 
TPeko10x 
60.7 
66.0 
8.6% 
61.7 
64.4 
4.3% 
TPeko11x 
64.8 
67.2 
3.7% 
65.1 
67.9 
4.3% 
TPeko12x 
57.9 
63.8 
10.1% 
57.4 
61.3 
6.8% 
TPeko13x 
61.8 
66.1 
7.0% 
62.8 
65.6 
4.5% 
TPeko14x 
62.7 
71.3 
13.7% 
65.0 
69.4 
6.8% 
TPeko15x 
53.2 
61.7 
15.9% 
50.5 
53.5 
6.0% 
TPeko16x 
55.2 
63.6 
15.1% 
53.1 
55.6 
4.8% 
TPeko17x 
64.3 
69.8 
8.6% 
68.5 
67.3 
1.7% 
Avg. 


8.3% 


4.1% 
From the results, we can make the following
observation:
· On
average, the delay generated by PAR is 8.3% worse than our constructed
solutions without routing, and 4.1% after routing.
Compared with our experiment with
VPR and PATH, the divergence here is much smaller, especially after routing. In
fact, for some cases, the result by PAR is better than our constructed
solution.
One possible reason is that the delay between two elements on a Virtex chip is not monotone with regard to their Manhattan distance. It depends heavily on the routing path chosen for each net.
We also performed
experiments with the Altera place and route tool, Quartus on TPEKOA. The
version we experimented with is Quartus II v.3.0. First, Quartus places and
routes the circuits given the information of the “optimal” placement obtained
by the TPEKO algorithm. The delay reported by the tool serves as an upper bound
of the optimal solution for the circuit. In the second experiment Quartus
places and routes the circuits without any hints. By comparing the delays
reported by the tool in both experiments, we can estimate the performance of
the tool. Table 6 reports the experimental results on the TPEKOA suite.
Table 6. Experimental results on TPEKOA with
Quartus. The upper bound of the optimal delay by our construction is given in
the column “UB”, the result by Quartus is given in the column “Quartus”. The delays are given
in nanoseconds.
Circuit 
UB 
Quartus 
diff 

TPeko01a 
18.5 
19.5 
5.1% 

TPeko02a 
17.7 
18.4 
4.1% 

TPeko03a 
12.8 
13.0 
1.8% 

TPeko04a 
17.9 
18.0 
0.6% 

TPeko05a 
17.7 
17.8 
0.5% 

Tpeko06a 
20.2 
21.3 
5.4% 

TPeko07a 
26.7 
28.0 
4.9% 

TPeko08a 
21.2 
22.3 
5.5% 

TPeko09a 
14.9 
15.1 
1.2% 

TPeko10a 
23.2 
24.4 
4.9% 

TPeko11a 
24.5 
25.3 
3.2% 

TPeko12a 
23.6 
25.0 
6.0% 

TPeko13a 
14.0 
13.9 
0.5% 

TPeko14a 
13.0 
13.0 
0.0% 

TPeko15a 
21.8 
22.3 
2.0% 

TPeko16a 
20.0 
20.6 
2.9% 

Avg. 


3.0% 
On average, the delay generated by Quartus is
3.0% worse than our constructed solutions. The conclusions we can draw are the
same as for the Xilinx experiments. The divergence from the upper bound is much
smaller than the academic tools. The hierarchical nature of the Stratix devices
and the nonmonotonicity of the delays between the elements on the Stratix chip
in regards to their distance probably contribute to the high quality of the
performance of the tool for the TPEKOA examples.
Our
experiment is not intended to compare the placers. Rather, it is for a gap
analysis to see how much room remains for improvement in timingdriven
placement algorithms. Please do not interpret the results here as a ranking of
the placers.
TPEKO,
TPEKOX and TPEKOA have obvious limitations: the nodes along the longest paths
are placed adjacent to each other, which may not be true on real circuits.
Therefore, achieving good results on TPEKO, TPEKOX and TPEKOA does not
guarantee good performance on real circuits. Also, it is not encouraged to tune
the placer in order to beat these examples.
Reference
[1] C. Chang, J. Cong, and M.
Xie, "Optimality and scalability study of existing placement
algorithms," in Proc. Asia South Pacific Design Automation Conference, pp.
621  627, 2003.
[2] J. Cong, M. Romesis, and M.
Xie, "Optimality, scalability and stability study of partitioning and
placement algorithms," in Proc. International Symposium on Physical
Design, pp. 88  94, 2003.
[3] V. Betz, J. Rose, and A.
Marquardt, Architecture and CAD for DeepSubmicron FPGAs. Kluwer Academic
Publishers, 1999.
[4] http://www.eecg.toronto.edu/~vaughn/vpr/vpr.html.
[5] A. Marquardt, V. Betz, and
J. Rose, "Timingdriven placement for FPGAs," in Proc. International
Symposium on Field Programmable Gate Arrays, pp. 203 213, ACM Press, 2000.
[6] T. Kong, "A novel net
weighting algorithm for timingdriven placement," in Proc. Intl. Conf. on
ComputerAided Design, pp. 172  176, 2002.
[7] Xilinx Inc., Virtex 2.5V
FPGA Complete Data Sheet (all four Modules).
[8] J. Cong, M. Romesis and M.
Xie, "Optimality and stability study of timingdriven placement
algorithms," to appear in Proc. International Conference on ComputerAided
Design, 2003.
[9] Altera, Stratix Device Family
Data Sheet, http://www.altera.com/literature/litstx.jsp
Please direct your questions to xie@cs.ucla.edu. Thanks.