TPEKO Suite

(Timing-driven Placement Example with Known Optimal delay)

Director : Prof. Jason Cong

Authors : Michail Romesis and Min Xie

Copyright© 2002-2003 the Regents of University of California


 

Motivation

The placement problem has been studied extensively in the past 30 years. However, a recent study of wirelength-driven placement [1] shows that existing placement solutions are surprisingly far from optimal. Using a set of constructed placement examples that match many industrial circuit characteristics with known optimal wirelength (PEKO), the study shows that the results of leading placement tools from both industry and academia are 70% to 150% away from the optimal solutions on those examples. An extension of PEKO was presented in [2], where new examples called PEKU (Placement Examples with Known Upper bounds) were created by inserting a certain percentage of non­local nets into a PEKO circuit. By relaxing the optimality constraint on a subset of connections, PEKU more accurately emulates real circuits in terms of wirelength distribution. Experiments showed that for PEKU benchmarks, state­of­the­art placers can be far away from the upper bound. In the extreme case, where each circuit consists of global connections only (G-PEKU benchmarks), existing tools can be 41% to 102% away in the worst case. These studies generated great interest in both academia and industry.

However, wirelength is not the sole objective in circuit placement. In the era of DSM technology, an important goal of placement is performance (delay) optimization. There is a strong need to extend the optimality study to timing­driven placement algorithms.

Circuit Description

The TPEKO suite is developed at UCLA VLSI CAD LAB. A total of 20 synthetic circuits are constructed from MCNC benchmarks such that their optimal delays are known under a simplified linear delay model, i.e., the delay of each gate is a constant dg, the delay between two gates A and B is di(A,B) = dist(A, B)*du.. Here, dist(A,B) is the Manhattan distance between A and B, du is a constant. TPEKO is given in the format specified in [3] for FPGA placement. We chose this format for its flexibility to specify our delay model and cell library. Table 1 gives the characteristic of TPEKO suite. TPEKO can be downloaded here.

Table 1. Characteristics of the TPEKO suite. Column "Orig" gives the initial circuit from which the synthetic circuits are derived. M is the number of longest paths in each synthetic circuit. M = 0 corresponds to the original circuit. Column "Opt" gives the optimal delay for each circuit. It is the same for circuits derived from the same initial placement for M ³ 1.

Ckt

Orig

Opt

M = 0

M = 1

M = 3

M = 5

CLB

PI

PO

FF

NET

CLB

PI

PO

FF

NET

CLB

PI

PO

FF

NET

CLB

PI

PO

FF

NET

TPeko01

tseng

88

1047

51

122

385

1098

1056

51

105

371

1107

1059

51

85

341

1128

1060

51

81

326

1162

TPeko02

ex5p

108

1064

8

63

0

1072

1067

8

57

5

1075

1068

8

56

6

1102

1068

8

50

32

1127

TPeko03

apex4

106

1261

9

19

0

1271

1275

9

22

25

1284

1287

9

17

24

1310

1287

9

18

38

1347

TPeko04

dsip

116

1370

228

197

224

1598

1424

228

192

226

1652

1537

228

189

228

1789

1653

228

189

230

1932

TPeko05

misex3

92

1397

14

14

0

1411

1415

14

34

40

1429

1427

14

15

19

1461

1431

14

11

23

1496

TPeko06

diffeq

84

1497

63

39

377

1560

1502

63

26

366

1565

1511

63

19

354

1596

1512

63

18

343

1626

TPeko07

alu4

100

1522

14

8

0

1536

1535

14

8

2

1549

1546

14

8

6

1579

1561

14

8

11

1626

TPeko08

des

132

1591

256

245

0

1847

1604

255

206

10

1859

1686

254

194

16

1971

1772

254

191

17

2077

TPeko09

bigkey

124

1707

228

197

224

1935

1766

228

197

226

1994

1793

224

188

239

2036

1801

223

166

227

2075

TPeko10

seq

122

1750

41

35

0

1791

1754

41

40

18

1795

1756

41

38

27

1814

1756

41

28

24

1848

TPeko11

apex2

128

1878

38

3

0

1916

1894

38

8

13

1932

1912

38

5

17

1977

1913

38

8

25

2002

TPeko12

s298

166

1931

3

6

8

1934

1934

3

6

11

1937

1934

3

6

15

1986

1934

3

6

42

1988

TPeko13

frisc

170

3556

19

116

886

3575

3568

19

113

896

3587

3576

19

109

893

3628

3578

19

105

876

3648

TPeko14

elliptic

146

3604

130

114

1122

3734

3616

130

81

1099

3746

3628

130

68

1054

3789

3638

130

63

1027

3819

TPeko15

spla

184

3690

16

46

0

3706

3698

16

50

14

3714

3706

16

47

25

3773

3706

16

53

35

3773

TPeko16

pdc

240

4575

16

40

0

4591

4595

16

55

30

4611

4607

16

48

35

4667

4607

16

39

27

4674

TPeko17

ex1010

290

4598

10

10

0

4608

4610

10

21

22

4620

4612

10

10

14

4673

4612

10

10

21

4673

TPeko18

S38417

164

6406

28

106

1463

6434

6417

28

96

1477

6445

6434

28

94

1485

6501

6441

28

88

1435

6520

TPeko19

S38584.1

164

6435

37

304

1260

6484

6457

37

262

1250

6494

6476

37

236

1236

6552

6487

37

233

1211

6575

TPeko20

clma

328

8382

61

82

33

8444

8393

60

60

128

8453

8402

60

57

117

8513

8408

60

53

120

8519

 

We also constructed 17 synthetic examples for Xilinx Virtex architecture in XDL format [7]. These circuits are grouped as TPEKO-X and can be downloaded here. Table 2 gives the characteristic of these examples.

 

Table 2. Characteristics of the TPEKO-X suite. The column "Orig" gives the MCNC benchmark from which each synthetic circuit is derived. Depending on the circuit size, the chip and packaging we chose for each circuit is different.

Circuit

Orig

chip/package

IOB

Slice

Net

TPeko01-x

ex5p

xcv50/bg256

73

573

1130

TPeko02-x

tseng

xcv200/fg456

176

582

1182

TPeko03-x

apex4

xcv50/bg256

33

669

1302

TPeko04-x

misex3

xcv50/bg256

31

760

1415

TPeko05-x

alu4

xcv50/bg256

27

766

1537

Tpeko06-x

diffeq

Xcv50/bg256

110

766

1563

TPeko07-x

dsip

xcv600/fg680

435

832

1690

TPeko08-x

des

xcv600/fg680

504

863

1932

TPeko09-x

seq

xcv100/bg256

79

941

1832

TPeko10-x

apex2

xcv100/bg256

49

968

1940

TPeko11-x

s298

xcv100/bg256

16

1199

1964

TPeko12-x

frisc

xcv200/fg456

152

1824

3562

TPeko13-x

spla

xcv200/fg456

62

1961

3731

TPeko14-x

elliptic

xcv200/fg456

248

1985

3766

TPeko15-x

pdc

xcv200/fg456

59

2350

4593

TPeko16-x

ex1010

xcv200/fg456

29

2350

4614

TPeko17-x

clma

xcv600/fg680

130

4704

8463

 

Finally, we constructed 16 synthetic examples for the Altera Stratix  architecture [9]. These circuits are grouped as the TPEKO-A and can be downloaded here. Table 3 gives the characteristic of these examples.

 

 

Table 3. Characteristics of the TPEKO-A suite. The column "Orig" gives the MCNC benchmark from which each synthetic circuit is derived.

 

Circuit

Orig

Chip

IOs

LEs

Nets

TPeko01-a

apex4

EP1S10B672C6

31

1503

1514

TPeko02-a

misex3

EP1S10B672C6

30

1628

1643

TPeko03-a

diffeq

EP1S10B672C6

104

1663

1727

TPeko04-a

alu4

EP1S10B672C6

24

1753

1768

TPeko05-a

seq

EP1S10B672C6

78

1986

2028

Tpeko06-a

apex2

EP1S10B672C6

43

2144

2183

TPeko07-a

s298

EP1S10B672C6

11

2277

2281

TPeko08-a

frisc

EP1S10B672C6

137

3837

3857

TPeko09-a

elliptic

EP1S10B672C6

246

3790

3921

TPeko10-a

spla

EP1S10B672C6

64

3091

4008

TPeko11-a

pdc

EP1S10B672C6

58

4906

4923

TPeko12-a

ex1010

EP1S10B672C6

22

4914

4925

TPeko13-a

s38417

EP1S10B672C6

136

6577

6606

TPeko14-a

tseng

EP1S10B672C6

175

1198

1250

TPeko15-a

ex5p

EP1S10B672C6

73

1350

1359

TPeko16-a

clma

EP1S10B672C6

146

8639

8702

 

Experimental Result

We experimented with two state-of-the-art FPGA placers on TPEKO, including:

·  VPR. VPR is a well known FPGA placement and routing package widely used for FPGA architecture evaluation [3,5]. Its optimization engine is based on simulated annealing. It combines connection­based and path­based timing­analysis. The cost function it uses trades off between wirelength and critical path delay. We used VPR v.4.3 downloaded from [4] in our experiment.

·  PATH. PATH is the latest FPGA placement algorithm which presents a significant enhancement to VPR in timing optimization [6]. It takes into consideration the path sharing effect. PATH introduces a new net weighting algorithm based on the concept of path­counting. We used PATH v.1.0 in our experiment.

We modified the delay computation in each algorithm, so that the interconnect delay is always linear with regard to the Manhattan distance. This change, in effect, makes our study of these algorithms independent of the FPGA architecture and their routing procedure. In our experiment, the tradeoff parameter of wirelength vs. delay is 0.5, as suggested by [5]. Changing the value of this parameter to favor the critical path delay minimization did not seem to improve the final results.

Table 4. Experimental results on TPEKO for VPR and PATH. The average difference between each algorithm's result and the optimal solution is listed. For completeness, the best results for every circuit are also included.

Circuit

M = 1

M = 3

M = 5

VPR

PATH

VPR

PATH

VPR

PATH

Avg

Best

Avg

Best

Avg

Best

Avg

Best

Avg

Best

Avg

Best

TPeko01

4%

0%

7%

0%

21%

17%

17%

15%

35%

27%

25%

20%

TPeko02

6%

1%

4%

1%

23%

15%

17%

13%

38%

31%

16%

14%

TPeko03

12%

4%

6%

0%

30%

23%

12%

8%

27%

24%

15%

10%

TPeko04

7%

0%

0%

0%

10%

7%

5%

3%

18%

14%

15%

9%

TPeko05

22%

7%

10%

0%

34%

25%

15%

5%

34%

24%

16%

11%

TPeko06

10%

5%

5%

1%

26%

20%

19%

8%

34%

20%

17%

14%

TPeko07

22%

10%

4%

0%

25%

20%

11%

7%

33%

27%

17%

13%

TPeko08

53%

44%

18%

7%

52%

47%

30%

24%

40%

20%

33%

27%

TPeko09

9%

2%

0%

0%

19%

14%

24%

15%

29%

28%

21%

18%

TPeko10

11%

6%

7%

1%

27%

25%

18%

12%

33%

29%

15%

13%

TPeko11

15%

10%

8%

0%

26%

19%

10%

6%

40%

33%

15%

12%

TPeko12

11%

4%

4%

1%

40%

16%

13%

11%

39%

15%

16%

14%

TPeko13

31%

25%

15%

8%

39%

22%

35%

19%

35%

29%

32%

22%

TPeko14

12%

8%

2%

1%

26%

21%

21%

12%

32%

26%

31%

25%

TPeko15

15%

11%

7%

1%

25%

22%

17%

10%

36%

33%

23%

18%

TPeko16

17%

7%

13%

3%

61%

55%

14%

12%

36%

25%

19%

13%

TPeko17

24%

17%

10%

2%

24%

15%

17%

12%

30%

21%

26%

21%

TPeko18

32%

15%

19%

5%

33%

15%

30%

20%

29%

22%

30%

16%

TPeko19

17%

13%

34%

26%

46%

25%

45%

38%

48%

41%

41%

37%

TPeko20

32%

25%

24%

14%

49%

37%

25%

19%

47%

30%

38%

31%

Avg.

18%

11%

10%

4%

32%

23%

20%

13%

35%

26%

23%

18%

 

Table 4. gives the experimental results. The average difference between each algorithm's result and the optimal solution is listed. For completeness, the best results for every circuit are also reported. From the results, we can make the following observations:

·  For M = 1, the delay produced by the algorithms is from 10% to 18% longer than the optima of T­PEKO on average, and from 34% to 53% longer in the worst case.

·  The solution quality of both algorithms deteriorates as M increases. For M = 5, the gap between their solutions and the optima is from 23% to 35% on average, and from 41% to 48% in the worst case.

We experimented with Xilinx placement and routing tool, PAR on TPEKO-X. The version we experimented is Release 5.1.03i ­ PAR F.26. First, we let PAR do placement without routing and compared the delay with that of the constructed solutions. Then we let PAR do placement followed by routing. In the latter case, we used PAR to do routing on our constructed solutions and quoted the delay reported by its timing analysis tool. The delay on our constructed solutions served as upper bounds to the optimal delay for the synthetic circuits. To guarantee that PAR can find the minimum possible delay in this experiment, we set a loose delay constraint at the beginning and gradually tighten it until PAR can no longer find a solution satisfying this constraint. Table 5 gives the experimental results on these circuits.

Table 5. Experimental results on TPEKO-X with PAR. The first few columns give the circuit characteristics. The upper bound of the optimal delay by our construction is given in the column "UB", the result by PAR is given in the column "PAR". The delays are given in nano­seconds.

Circuit

w/o routing

w routing

UB (ns)

 PAR (ns)

diff

UB (ns)

 PAR (ns)

diff

TPeko01-x

70.0

75.2

7.5%

76.8

80.4

4.7%

TPeko02-x

78.2

80.3

2.7%

90.6

90.5

-0.1%

TPeko03-x

62.7

65.8

4.9%

64.7

67.5

4.3%

TPeko04-x

55.9

62.5

11.9%

54.7

56.9

4.1%

TPeko05-x

52.2

57.4

10.0%

49.0

53.5

9.2%

Tpeko06-x

59.2

61.2

3.5%

55.9

58.3

4.2%

TPeko07-x

78.2

83.5

6.7%

90.7

95.0

4.7%

TPeko08-x

78.2

83.0

6.1%

90.2

91.1

1.1%

TPeko09-x

64.6

68.1

5.5%

68.3

69.9

2.4%

TPeko10-x

60.7

66.0

8.6%

61.7

64.4

4.3%

TPeko11-x

64.8

67.2

3.7%

65.1

67.9

4.3%

TPeko12-x

57.9

63.8

10.1%

57.4

61.3

6.8%

TPeko13-x

61.8

66.1

7.0%

62.8

65.6

4.5%

TPeko14-x

62.7

71.3

13.7%

65.0

69.4

6.8%

TPeko15-x

53.2

61.7

15.9%

50.5

53.5

6.0%

TPeko16-x

55.2

63.6

15.1%

53.1

55.6

4.8%

TPeko17-x

64.3

69.8

8.6%

68.5

67.3

-1.7%

Avg.

 

 

8.3%

 

 

4.1%

 

From the results, we can make the following observation:

·  On average, the delay generated by PAR is 8.3% worse than our constructed solutions without routing, and 4.1% after routing.

Compared with our experiment with VPR and PATH, the divergence here is much smaller, especially after routing. In fact, for some cases, the result by PAR is better than our constructed solution.

One possible reason is that the delay between two elements on a Virtex chip is not monotone with regard to their Manhattan distance. It depends heavily on the routing path chosen for each net.

 

We also performed experiments with the Altera place and route tool, Quartus on TPEKO-A. The version we experimented with is Quartus II v.3.0. First, Quartus places and routes the circuits given the information of the “optimal” placement obtained by the TPEKO algorithm. The delay reported by the tool serves as an upper bound of the optimal solution for the circuit. In the second experiment Quartus places and routes the circuits without any hints. By comparing the delays reported by the tool in both experiments, we can estimate the performance of the tool. Table 6 reports the experimental results on the TPEKO-A suite.

 

Table 6. Experimental results on TPEKO-A with Quartus. The upper bound of the optimal delay by our construction is given in the column “UB”, the result by Quartus is given in the column “Quartus”. The delays are given in nano­seconds.

 

Circuit

UB

Quartus

diff

TPeko01-a

18.5

19.5

5.1%

TPeko02-a

17.7

18.4

4.1%

TPeko03-a

12.8

13.0

1.8%

TPeko04-a

17.9

18.0

0.6%

TPeko05-a

17.7

17.8

0.5%

Tpeko06-a

20.2

21.3

5.4%

TPeko07-a

26.7

28.0

4.9%

TPeko08-a

21.2

22.3

5.5%

TPeko09-a

14.9

15.1

1.2%

TPeko10-a

23.2

24.4

4.9%

TPeko11-a

24.5

25.3

3.2%

TPeko12-a

23.6

25.0

6.0%

TPeko13-a

14.0

13.9

-0.5%

TPeko14-a

13.0

13.0

0.0%

TPeko15-a

21.8

22.3

2.0%

TPeko16-a

20.0

20.6

2.9%

Avg.

 

 

3.0%

 

On average, the delay generated by Quartus is 3.0% worse than our constructed solutions. The conclusions we can draw are the same as for the Xilinx experiments. The divergence from the upper bound is much smaller than the academic tools. The hierarchical nature of the Stratix devices and the non-monotonicity of the delays between the elements on the Stratix chip in regards to their distance probably contribute to the high quality of the performance of the tool for the TPEKO-A examples.

Our experiment is not intended to compare the placers. Rather, it is for a gap analysis to see how much room remains for improvement in timing-driven placement algorithms. Please do not interpret the results here as a ranking of the placers.

TPEKO, TPEKO-X and TPEKO-A have obvious limitations: the nodes along the longest paths are placed adjacent to each other, which may not be true on real circuits. Therefore, achieving good results on TPEKO, TPEKO-X and TPEKO-A does not guarantee good performance on real circuits. Also, it is not encouraged to tune the placer in order to beat these examples.

Reference

[1] C. Chang, J. Cong, and M. Xie, "Optimality and scalability study of existing placement algorithms," in Proc. Asia South Pacific Design Automation Conference, pp. 621 -- 627, 2003.

[2] J. Cong, M. Romesis, and M. Xie, "Optimality, scalability and stability study of partitioning and placement algorithms," in Proc. International Symposium on Physical Design, pp. 88 -- 94, 2003.

[3] V. Betz, J. Rose, and A. Marquardt, Architecture and CAD for Deep­Submicron FPGAs. Kluwer Academic Publishers, 1999.

[4] http://www.eecg.toronto.edu/~vaughn/vpr/vpr.html.

[5] A. Marquardt, V. Betz, and J. Rose, "Timing­driven placement for FPGAs," in Proc. International Symposium on Field Programmable Gate Arrays, pp. 203-- 213, ACM Press, 2000.

[6] T. Kong, "A novel net weighting algorithm for timing­driven placement," in Proc. Intl. Conf. on Computer­Aided Design, pp. 172 -- 176, 2002.

[7] Xilinx Inc., Virtex 2.5V FPGA Complete Data Sheet (all four Modules).

[8] J. Cong, M. Romesis and M. Xie, "Optimality and stability study of timing-driven placement algorithms," to appear in Proc. International Conference on Computer-Aided Design, 2003.

[9] Altera, Stratix Device Family Data Sheet, http://www.altera.com/literature/lit-stx.jsp


Please direct your questions to xie@cs.ucla.edu. Thanks.