Digital system design incorporates components ranging from general purpose processors (GPPs) to application specific integrated circuits (ASICs). GPPs provide flexible implementation of diverse applications at the cost of high execution time, energy and area. On the other hand, ASICs provide high performance, low energy and area, but they are usually custom designed for one or two specific applications. Reconfigurable computing frameworks are the solutions in between. They can take advantages from both GPPs and ASICs. Field programmable gate arrays (FPGAs) have emerged as attractive reconfigurable computing frameworks. They have the flexibility to map a variety of applications with fast speed, low energy and area. FPGAs integrate spatially distributed memory arrays and programmable routing resources. Functions are realized inside the memory blocks as lookup tables (LUTs) and the interconnects take care of the communication between different memory blocks. However, the energy and area are dominated by the programmable interconnects. With newer technology generations, these interconnects are not scaling as well as logic gates. Therefore, a reconfigurable framework that minimizes the requirements of programmable interconnects is expected to improve performance, energy and area while technology continues advancing.
This work proposes a novel reconfigurable computing framework for hardware acceleration, referred to as MAlleable Hardware Accelerator (MAHA). It uses a spatio-temporal computing model which aims at improving energy-efficiency of various algorithmic tasks. In each processing element (PE), the main computing is done by a memory block, which stores both data and LUTs. PEs are spatially distributed and communicate with each other through interconnects. The operation execution inside each PE is performed cycle by cycle in a temporal fashion. It significantly reduces the requirement of programmable interconnects compared to a fully spatial reconfigurable computing architecture, and hence it improves energy-efficiency. The scalability of such a framework is expected to be better than the fully spatial architectures because it drastically reduces the need for the programmable interconnect. Data can be read and executed locally inside each PE. Such a memory-centric computing platform provides a great opportunity to mitigate the off-chip bandwidth requirement between memory and computing engines in a conventional computing architecture.