In this paper, we study the effect of wiresizing with driver sizing on performance and power optimization. Our study showed that wiresizing is an effective method to reduce interconnect delay. Optimal wiresizing solutions require additional routing area but do not increase the power dissipation significantly. We propose to combine the driver sizing and wiresizing techniques as a low-power solution to meet the high-speed interconnect requirement. Experimental results have shown that together with cascaded drivers, our wiresizing solution consumes less power while meeting the delay requirement when compared to the conventional methods of using cascaded drivers with uniform-width interconnection.
In this paper, we study the simultaneous driver and wire sizing (SDWS) problem under two objective functions: (i) delay minimization only, or (ii) combined delay and power dissipation minimization. We present general formulations of the SDWS problem under these two objectives based on the distributed Elmore delay model with consideration of both capacitive power dissipation and short-circuit power dissipation. We show several interesting properties of the optimal SDWS solutions under the two objectives, including an important result (DS/WS Relation Theorem) which reveals the relationship between driver sizing and optimal wire sizing. These results lead to polynomial time algorithms for computing the lower and upper bounds of optimal SDWS solutions under the two objectives, and efficient algorithms for computing optimal SDWS solutions under the two objectives. We have implemented these algorithms and compared them with existing design methods for driver sizing only or independent driver and wire sizing. Accurate SPICE simulation shows that our methods reduce the delay by up to 12%--49% and power dissipation by 26%--63% compared with existing design methods.
In this paper, we present a new clock routing algorithm which minimizes total wirelength under any given path-length skew bound. The algorithm constructs a bounded-skew tree (BST) in two steps: (i) a bottom-up phase to construct a binary tree of shortest-distance feasible regions which represent the loci of possible placements of clock entry points, and (ii) a top-down phase to determine the exact locations of clock entry points. Experimental results show that our clock routing algorithm, named BST/DME, can produce a set of routing solutions with skew and wirelength trade-off.
Routing of signal nets of VLSI circuits is a typical interconnection problem. The objective is to connect a set, or net, of terminals on a chip with minimum wirelength such that there exists a path between any two terminals. The problem may be solved sub-optimally by a minimum spanning tree (MST). Optimal interconnection can be obtained if the problem is formulated as the Steiner problem in which wires may meet at newly added terminals or Steiner points. The optimal interconnection is known as the Steiner minimal tree (SMT).
The Steiner problem in Euclidean and rectilinear routing models has been well studied. There are two difficult issues: the first is to establish the Steiner ratio, which is defined to be the lowest ratio of the length of a SMT to the length of a minimum spanning tree (MST) among all problem instances. It measures how closely the MST approximates the SMT. The Euclidean and rectilinear Steiner ratios are $\sqrt{3}/2$ and $2/3$, respectively. The other issue concerns with developing ``good'' heuristics to solve the problem which is known to be NP-complete. The performance ratio of an algorithm is the lowest ratio of the length of a SMT to the length of the Steiner tree produced by the algorithm among all problem instances. The challenge is to invent an algorithm that has a performance ratio strictly higher than the Steiner ratio. Most of the MST-based heuristics have performance ratio arbitrarily close to the Steiner ratio.
In this thesis, we focus on a new geometric formulation for the Steiner problem in the application to VLSI circuit design. Traditionally, wiring in VLSI systems is restricted to the horizontal and vertical orientations, i.e. the rectilinear routing model. We introduce a new routing model called the octilinear routing model. The new model allows two additional diagonal ($\pm45^\circ$) wiring orientations on top of the rectilinear orientations. Recently, the model has been used in channel routing. Results have shown that overall routing area and wiring length required is generally lower than that for the traditional rectilinear routing model. This motivates us to look at the octilinear Steiner problem.
In this thesis, we extend various important properties of SMT from the rectilinear model to the octilinear model. We show that a Steiner point in a SMT has three or four neighbors. Valid configurations of the edges incident on a vertex in the SMT are also given. These properties allow us to solve the 3-point octilinear Steiner problem. We also establish that for the 3-point Steiner problem, the Steiner ratio is $(\sqrt{2}+1)/2\sqrt{2} \approx 0.854$. Partial results of the 4-point problem show that it is likely that the general $n$-point Steiner problem has the same Steiner ratio. This would imply that an octilinear MST is a better approximation to octilinear SMT than a rectilinear MST is to rectilinear SMT. We also generalize the Hanan's Theorem and show that for any problem instance, there exists a SMT such that all Steiner points in the SMT lie on the generalized Hanan's grid.
We propose a heuristic algorithm for the general $n$-point Steiner problem. When the problem size is less than ten, the algorithm produces Steiner tree which has an average of about 2.8--4\% improvement over the cost of MST. For problems with more points, the average improvement hovers over the 4\% mark. A possible explanation is that an octilinear MST is a very close approximation to octilinear SMT, thereby leaving little room for improvement.
We study the minimum-cost bounded-skew routing tree problem under the Elmore delay model. We present two approaches to construct bounded-skew routing trees: (i) the Boundary Merging and Embedding (BME) method which utilizes merging points that are restricted to the boundaries of merging regions, and (ii) the Interior Merging and Embedding (IME) algorithm which employs a sampling strategy and dynamic programming to consider merging points that are interior to, rather than on the boundary of, the merging regions. Our new algorithms allow accurate control of Elmore delay skew, and show the utility of merging points inside merging regions.
In this paper, we study the simultaneous buffer and wire sizing (SBWS) problem for delay and power dissipation minimization. We prove the BS/WS relation for optimal SBWS solutions. This relation leads to a polynomial time algorithm for computing the lower and upper bounds of the optimal SBWS solutions, which enables an efficient optimal algorithm for computing optimal SBWS solutions. We have applied the SBWS algorithms to the clock nets in a spread spectrum IF transceiver chip and HSPICE simulations show that our algorithms can reduce skew and power by a factor of $3.5X$ and $1.6X$, respectively, when compared to the manual layout of the clock nets in the original chip.
This paper presents a comprehensive survey of existing techniques for interconnect optimization during the VLSI physical design process, with emphasis on recent studies on interconnect design and optimization for high-performance VLSI circuit design under the deep submicron fabrication technologies. First, we present a number of interconnect delay models and driver/gate delay models of various degrees of accuracy and efficiency which are most useful to guide the circuit design and interconnect optimization process. Then, we classify the existing work on optimization of VLSI interconnect into the following three categories and discuss the results in each category in detail: (i) topology optimization for high-performance interconnects, including the algorithms for total wirelength minimization, critical pathlength minimization, and delay minimization; (ii) device and interconnect sizing, including techniques for efficient driver, gate, and transistor sizing, optimal wiresizing, and simultaneous topology construction, buffer insertion, buffer and wire sizing; (iii) high-performance clock routing, including abstract clock net topology generation and embedding, planar clock routing, buffer and wire sizing for clock nets, non-tree clock routing, and clock schedule optimization. For each method, we discuss its effectiveness, its advantages and limitations, as well as its computational efficiency. We group the related techniques according to either their optimization techniques or optimization objectives so that the reader can easily compare the quality and efficiency of different solutions.
The driving force behind the rapid growth of the VLSI technology has been the constant reduction of the feature size of VLSI devices (i.e. the minimum transistor size). The feature size decreased from about 2um in 1985, to about 1um in 1990, and to 0.35-0.5um today (1996). The prediction is that it will be reduced to about 0.18um in Year 2001. Such continual miniaturization of VLSI devices has strong impact on the VLSI technology in several ways. First, the device density on integrated circuits grows quadratically with the rate of decrease in the feature size. As a result, the total number of transistors on a single VLSI chip has increased from less than 500,000 in 1985 to over 10 million today. The prediction is that it will reach 64 million in Year 2001. Second, as the devices (transistors) operate at a higher speed, the interconnect delay becomes much more significant. In many systems designed today, as much as 50% to 70% of clock cycle are consumed by interconnect delays. This percentage will continue to rise as the feature size decreases further. Third, a significant and increasing amount of overall power is spent on charging and discharging interconnects, as the ratio between the total interconnect capacitance and the total device capacitance is increasing, and the total short-circuit and static power on device continues to be a secondary factor compared to the total capacitive power. Given these trends, the layout design has become increasingly important for performance and power optimization in deep submicron technologies as it finally determines the lengths and widths of interconnects in the design. We would also like to point out that although significant progress has been made on efficient estimation techniques of device switching activities at higher levels of the design hierarchy (see Chapters 4.3 and 4.4), the estimation of device loading capacitance prior to layout remains an unsolved problem, which considerably limits the accuracy of existing power estimation techniques at higher levels.
In this chapter, we discuss layout optimization for low-power designs. We start with power and delay models of interconnects and gates for layout optimization in the submicron technology. Then, we present various power optimization techniques at the layout level, including device layout optimization, interconnect layout optimization, simultaneous device and interconnect layout optimization, and clock layout optimization. For each topic, we present the problem formulation, describe one or several typical optimization methods in detail, and then provide a broad survey of the related techniques. A comprehensive reference list of existing works on layout optimization for low-power designs is provided at the end of the chapter.
Interconnect has become the dominating factor in determining circuit performance and reliability in deep submicron designs. In this embedded tutorial, we first discuss the trends and challenges of interconnect design as the technology feature size rapidly decreases towards below 0.1 micron. Then, we present commonly used interconnect models and a set of interconnect design and optimization techniques for improving interconnect performance and reliability. Finally, we present comparisons of different optimization techniques in terms of their efficiency and optimization results, and show the impact of these optimization techniques on interconnect performance in each technology generation from the 0.35um to 0.07um projected in the National Technology Roadmap for Semiconductors.
This paper presents an efficient approach to perform global interconnect sizing and spacing (GISS) for multiple nets to minimize interconnect delays with consideration of coupling capacitance, in addition to area and fringing capacitances. We introduce the formulation of symmetric and asymmetric wire sizing and spacing. We prove two important results on the symmetric and asymmetric effective-fringing properties which lead to a very effective bound computation algorithm to compute the upper and lower bounds of the optimal wire sizing and spacing solution for all nets under consideration. Our experiments show that in most cases the upper and lower bounds meet quickly after a few iterations and we actually obtain the optimal solution. To our knowledge, this is the first in-depth study of global wire sizing and spacing for multiple nets with consideration of coupling capacitance. Experimental results show that our GISS solutions lead to substantial delay reduction than existing single net wire-sizing solutions without consideration of coupling capacitance.
In this paper, we study the interconnect layout optimization problem under a higher-order RLC model to optimize not just delay, but also waveform for RLC circuits with non-monotone signal response. We propose a unified approach that considers topology optimization, wire-sizing optimization, and waveform optimization simultaneously. Our algorithm considers a large class of routing topologies, ranging from shortest-path Steiner trees to bounded-radius Steiner trees and Steiner routings. We construct a set of required-arrival-time Steiner trees or RATS-trees, providing a smooth trade-off among signal delay, wave-form, and routing area. Using a new incremental moment computation algorithm, we interleave topology construction with moment computation to facilitate accurate delay calculation and evaluation of wave-form quality. Experimental results show that our algorithm is able to construct a set of topologies providing a smooth trade-off among signal delay, signal settling time, voltage overshoot, and routing cost.
We study the minimum-cost bounded-skew routing tree problem under the pathlength (linear) and Elmore delay models. This problem captures several engineering tradeoffs in the design of routing topologies with controlled skew. Our bounded-skew routing algorithm, called the BST/DME algorithm, extends the DME algorithm for exact zero-skew trees via the concept of a merging region. For a prescribed topology, BST/DME constructs a bounded-skew tree (BST) in two phases: (i) a bottom-up phase to construct a binary tree of merging regions which represent the loci of possible embedding points of the internal nodes, and (ii) a top-down phase to determine the exact locations of the internal nodes. We present two approaches to construct the merging regions: (i) the Boundary Merging and Embedding (BME) method which utilizes merging points that are restricted to the boundaries of merging regions, and (ii) the Interior Merging and Embedding (IME) algorithm which employs a sampling strategy and dynamic programming to consider merging points that are interior to, as well as on the boundary of, the merging regions. When the topology is not prescribed , we propose a new Greedy-BST/DME algorithm which combines the merging region computation with topology generation. The Greedy-BST/DME algorithm very closely matches the best known heuristics for the zero-skew case, and for the unbounded-skew case (i.e, the Steiner minimal tree problem). Experimental results show that our BST algorithms can produce a set of routing solutions with smooth skew and wirelength trade-off.