Compared to ASICs, FPGAs are not generally perceived as power efficient because they use a larger amount of transistors to provide programmability on the chip. Our goal of this research is to reduce the power consumption of FPGAs without sacrificing much performance or incurring a larger chip area. Here is a high-level illustration of our research.
FPGA Architecture Evaluation for Low Power
We develop a flexible FPGA architecture evaluation framework, named fpgaEVA-LP, for power efficiency analysis of LUT-based FPGA architectures. Our work has several contributions: (i) We develop a mixed-level FPGA power model that combines switch-level models for interconnects and macromodels for LUTs; (ii) We develop a tool that automatically generates a back-annotated gate-level netlist with post-layout extracted capacitances and delays; (iii) We develop a cycle-accurate power simulator based on our power model. It carries out gate-level simulation under real delay model and is able to capture glitch power; (iv) Using the framework fpgaEVA-LP, we study the power efficiency of FPGAs, in 0.10um technology, under various settings of architecture parameters such as LUT sizes, cluster sizes and wire segmentation schemes and reach several important conclusions. We also present the detailed power consumption distribution among different FPGA components and shed light on the potential opportunities of power optimization for future FPGA designs (e.g., 0.10um technology).
FPGA Circuit Design For Low Power
Traditional FPGAs use uniform supply voltage Vdd and uniform threshold voltage Vt. We propose to use pre-defined dual-Vdd and dual-Vt fabrics to reduce FPGA power. We design FPGA circuits with dual-Vdd/dual-Vt to effectively reduce both dynamic power and leakage power, and define dual-Vdd/dual-Vt FPGA fabrics based on the profiling of benchmark circuits. We further develop CAD algorithms including power-sensitivity based voltage assignment and simulated-annealing based placement to leverage such fabrics. Compared to the conventional fabric using uniform Vdd/Vt at the same target clock frequency, our new fabric using dual Vt achieves 9% to 20% power reduction. However, the pre-defined FPGA fabric using both dual Vdd and dual Vt only achieves on average 2% extra power reduction. It is because that the pre-designed dual-Vdd layout pattern introduces non-negligible performance penalty. Therefore, programmability of supply voltage is needed to achieve significant power saving for dual-Vdd FPGAs. To our best knowledge, it is the first in-depth study on applying both dual-Vdd and dual-Vt to FPGA considering circuits, fabrics and CAD algorithms.
FPGA Logic Synthesis for Low Power
One of the popular design techniques for power reduction is to lower supply voltage, which results in a quadratic reduction of power dissipation. However, the major drawback is the negative impact on chip performance. A multiple supply voltage design in which a reduction in supply voltage is applied only to non-critical paths can save power without sacrificing performance. In our work we develop a low-power FPGA mapping algorithm, named DVmap, with consideration of delay and power optimization crossing two supply voltages. We do not add the constraint that the low-Vdd and the high-Vdd LUTs have to be clustered separately since FPGA architecture can program the voltages of the build-in LUTs and converters as needed. We use the cut-enumeration technique to produce all the possible ways of mapping a LUT rooted on a node. We then generate different sets of power and delay solutions for each possible way based on the various voltage changing scenarios. After the timing constraint is determined, the non-critical paths will be relaxed in order to accommodate low-Vdd LUTs to reduce power. Due to the involvement of level converters and their additional delays, we use two types of required times during mapping to guarantee that the final mapping delay is still optimal. To show the efficiency of our algorithm, we first design a mapping algorithm with single Vdd, which uses similar cost function as that in DVmap and relaxes the non-critical paths based on cost estimation to achieve better power results. The single-Vdd mapper, named SVmap, shows advantages over the latest published low-power mapping algorithms. We then show that our dual-Vdd mapping algorithm DVmap can further improve SVmap by up to 11.6% for power savings.
Another work in this area is on FPGA circuit clustering. We present a delay optimal FPGA clustering algorithm targeting low power. We assume that the configurable logic blocks of the FPGA can be either programmed using a high supply voltage (high-Vdd) or a low supply voltage (low-Vdd). We carry out the clustering procedure with the guarantee that the delay of the circuit under the general delay model is optimal and in the mean time, logic blocks on the non-critical paths can be driven by low-Vdd to save power. We also explore a set of dual-Vdd combinations to find out the best ratio between low-Vdd and high-Vdd to achieve the largest power reduction. Experimental results show that our clustering algorithm can achieve power savings by up to 20.3% compared to the clustering result for an FPGA with a single high-Vdd. To our knowledge, this is the first work on dual-Vdd clustering for FPGA architectures.
FPGA High Level Synthesis for Low Power
We introduce a RT-level power estimator that provides directions for effective power reduction. In our FPGA power evaluation work, we showed that power consumption of interconnects is a dominant source (more than 70% of the total power) in deep sub-micron (0.1um or lower) FPGAs. Consequently, we design our power estimation and optimization techniques in consonance with the total wire capacitance and the number of buffers in the routing channels. In order to estimate the amount of wires used before layout, we adopt a recently published work for wire length estimation to obtain the total number of interconnects of various lengths across the chip. To calculate switching activities for the RT-level design, we implement an efficient switching activity calculator using control data flow graph (CDFG) simulation so that we perform simulation just once before the process of binding and scheduling, and compute switching activities for any different binding and scheduling solutions, without repeating simulations. Our RT-level power estimation works well with a 16.2% average error compared to the actual power data reported by a cycle-accurate power simulator after placement and routing. Based on this RT-level power estimator, we design a simulated annealing-based algorithm for simultaneous binding and scheduling for power minimization. This algorithm is able to explore a large solution space, considering multiple constraining factors for functional unit and register binding, connection allocation, and scheduling. For each binding solution, the wire length and total power (including static power) are estimated and used as the optimization cost to guide the annealing process. Experimental results show that on average, our solution reduces required logic elements by half to realize the design on an FPGA and improves power by 35.8% compared to the results obtained through the Synopsys Behavioral Compiler.
We also develop effective algorithms to reduce the number and sizes of the multiplexers generated during high-level synthesis. We first formulate the register binding problem for MUX reduction as a problem of calculating the minimum-weighted cofamily of a partially ordered set (POSET). We derive several theorems to guide the procedure for obtaining the minimum-weighted cofamily through calculating the minimum-cost flow in a flow network. The cost is defined by the number of MUXes and the connection scenarios for different binding solutions. Second, we also study port assignment for further reduction of MUX connections. We apply a simple operation named operand swapping to improve an initial random port assignment. Experimental results show that overall, our algorithm is able to reduce MUX connections by 7% compared to a bipartite matching-based algorithm. For large designs, our algorithm is 10% better. Port assignment is able to reduce 84% of the upper-bound reduction value. Placement and routing results show that our solution improves area, delay and power by up to 6%, 8%, and 6% respectively for FPGA designs.
On-going Work
For FPGAs with dual-supply voltages, we develop an algorithm that performs simultaneous functional unit binding and multi-voltage assignment. It is a polynomial-time optimal algorithm for assigning low-Vdd to as many operations as possible under the resource and time constraint, and in the same time, minimizing total switching activity through functional unit binding. Our algorithm shows consistent improvement over a design flow that separates voltage assignment from functional unit binding. We also change the initial scheduling to examine power-latency and power-area tradeoff scenarios and provide power optimization solutions under different latency and area requirements. Experimental results show that we can achieve a 21% power reduction when both latency and resource bounds are tight. When latency is relaxed by 75% and resource is relaxed by about 28%, the power reduction is more than 55% compared to the synthesis results for the single high-Vdd case that has the same amount of relaxation.
We are also considering power gating and clock gating for low power FPGA designs.
Publications
- F. Li, D. Chen, L. He, and J. Cong, "Architecture Evaluation for Power-Efficient FPGAs," ACM/SIGDA International Symposium on Field Programmable Gate Arrays, Monterey, California, pp. 175 - 184, February 2003.
- F. Li, Y. Lin, L. He, and J. Cong, "Low-power FPGA using Pre-Defined Dual-Vdd/Dual-Vt Fabrics," Proceedings of the ACM International Symposium on Field-Programmable Gate Arrays, February 2004.
- D. Chen, J. Cong, F. Li, and L. He, "Low-Power Technology Mapping for FPGA Architectures with Dual Supply Voltages," Proceedings of the ACM International Symposium on Field-Programmable Gate Arrays, February 2004.
- D. Chen, J. Cong, "Delay Optimal Low-Power Circuit Clustering for FPGAs with Dual Supply Voltages," International Symposium on Low Power Electronics and Design, Aug. 2004. to appear.
- D. Chen, J. Cong, and Y. Fan, "Low-Power High-Level Synthesis for FPGA Architectures," International Symposium on Low Power Electronics and Design, Seoul, Korea, pp. 134 - 139, Aug. 2003.
- D. Chen, and J. Cong, "Register Binding and Port Assignment for Multiplexer Optimization," Proceedings of the Asia Pacific Design Automation Conference, January 2004.
- D. Chen, J. Cong, and J. Xu, "Optimal Module and Voltage Assignment for Low Power," Asia Pacific Design Automation Conference, Shanghai, China, Janu ary 2005, to appear.
Sponsors