FPGA Technology Mapping with Retiming


We have done the following works in this area.
    1. Efficient FPGA Mapping with Retiming for Clock Period Minimization
     
    2. FPGA Synthesis with Retiming for Clock Period Minimization
     
    3. FPGA Mapping with Retiming for Sequential Circuits with Given
       Initial States

Current Work:

    1. Efficient FPGA Mapping with Area-Depth Tradeoff
     
    2. FPGA Mapping with Retiming for Real Designs with:
       initial states
       (asynchronous) clock enables
       multiple clocks
     
    3. FPGA Mapping with Retiming with Interconnect Delay
 
Summary of Finished Work:
 
    Conventional FPGA mapping algorithms are designed for combinational circuits only. For sequential circuits, one needs to cut off all the flipflops and maps each combinational subcircuit separately and then connects the mapping solutions with the original flipflops to form a mapping solution. This kind of approaches usually cannot lead to the optimal solutions for sequential circuits under retiming. Retiming is a powerful technique for clock period minimization by repositioning flipflops. With retiming, we can enlarge the mapping solution space significantly and potentially achieve much better results.
    In usual case, retiming and FPGA mapping are performed separately. However, the serious problem for this separated approach is that retiming cannot change the circuit structure and its results are limited by mapping. On the otherhand, the current mapping algorithms do not know how good a mapping solution will be after retiming. It is expected that we can achieve the best possible results by combining retiming and mapping. Along this line, our first work is an efficient FPGA mapping with retiming algorithm for clock period minimization. It constitutes a significant improvement on runtime over the previous work on optimal FPGA mapping with retiming proposed by Prof. Pan and Liu. With guaranteed optimality of the results, our algorithm, named TurboMap, is over 1000 times faster than the algorithm by Pan and Liu. To further reduce the clock period, we combine the BDD-based functional decomposition with mapping and retiming to form a new algorithm, named TurboSYN. TurboSYN can reduce the clock period of optimal mapping with retiming by half in average. Comparing the results by FlowSYN (depth minimal mapping with resynthesis) followed by separated retiming, TurbSYN can also achieve results with 41% smaller clock periods.
    For sequential circuits with given initial states, our recent result is an optimal FPGA mapping with forward retiming algorithm with guaranteed new initial state computation. The motivation of our work due to the fact that retiming will change the initial state of the original circuit and it is NP-hard to decide if there exists an equivalent initial state for a retiming and if there does exist one, how to compute an equivalent one. Since the new initial state for forward retiming (always moving flipflops from the inputs of a gate to the outputs) can be computed efficiently in linear time, we separate a general retiming into forward retiming and backward time (always moving flipflops from the outputs of a gate to the inputs) and combine the mapping with forward retiming. The backward retiming can be performed as a preprocessing to push the flipflops to the primary inputs with the only consideration of whether an equivalent initial state can be computed efficiently. In fact, just performing mapping with forward retiming by our algorithm, named TurboMap-frt, we can reduce the clock period by 17% comparing with the conventional approach of depth minimal mapping for each combinational subcircuits followed by separated forward retiming. On the otherhand, the clock period by TurboMap-frt is only 3% larger than that by TurboMap of optimal mapping with general retiming, while many of results by TurboMap cannot compute an equivalent initial state in reasonable amount of time.

 
Brief Description of Current Work:
 

    1. Efficient FPGA Mapping with Area-Depth Tradeoff and for Heterogeneous FPGAs
     
      Most of the LUT-based FPGA mapping algorithms are for homogeneous FPGAswith LUTs of uniform size and for either area or depth minimization. Although there are some algorithms considered the area-depth tradeoff, the capability of reducing area by relaxing depth is limited and most of all, they cannot guarantee to reduce the area by relaxing depth. We plan to propose new algorithms with much better area-depth tradeoff capability and for heterogeneous FPGAs with LUTs of different sizes.
    2. FPGA Mapping with Retiming for Real Designs with:
         initial states
         (asynchronous) clock enables
         multiple clocks
      We have proposed a good heuristic for mapping with retiming for designs with given initial states. However, the problem of mapping with retiming for designs with initial states and clock enables and multiple clocks remain open. We plan to investigate the problem.
    3. FPGA Mapping with Retiming with Interconnect Delay
      Most the existing performance-driven FPGA mapping algorithms assume unit-delay model. As the FPGA technology goes into deep-submicron, the interconnect delay becomes more and more important. We plan to investigate the problem by considering either more accurate delay model or combining mapping with placement.