Overcoming the Limitations of Accelerator-Centric Architectures with Memoization-Driven Specialization

Speaker

Adi Fuchs

Details

Abstract

Modern computer systems are undergoing a paradigm shift away from general-purpose processors towards special-purpose processors, also known as "accelerators." The main driver of the accelerator revolution is the slowdown and nearing-end of CMOS transistor scaling, causing stagnation in general-purpose processor capabilities. Following the end of CMOS scaling, the only way to improve processor capabilities would be by improving hardware efficiency. While general-purpose processors are designed to run all user applications, such as an internet browser or a word processor, accelerators specialize on a given application (or applications) and, therefore, can significantly improve hardware efficiency.

Unfortunately, since accelerators are implemented using CMOS transistors, they are bound by the same CMOS scaling rules. Therefore, once transistors stop improving, the accelerator-based specialization will reach near-optimum, and yield diminishing returns, ultimately hitting, what we term as "the accelerator wall." Furthermore, the second limitation of accelerators stems from their long development and deployment times, making them unfit for emerging or rapidly-changing applications, due to the lack of supporting hardware for new computation patterns.

In this thesis, we present the "accelerator wall" and show how to evaluate the returns, the diminishing returns, and the limits of specialized accelerator chips for popular applications. We tackle the limits of accelerators using a technique called "memoization." In memoization, tables store previously-computed, and when a recurring computer input is encountered, the result is fetched and not redundantly recomputed by the processor. We construct memoization-driven special-purpose architectures that use scaling memories that are implemented using non-CMOS materials, and are therefore future-proof, in the sense that they are less prone to the limits of the end of CMOS scaling. We also use memoization-driven specialization to extend general-purpose processors and build specialized "accelerator-less" architectures. Our accelerator-less architectures achieve near-accelerator efficiency rates without having accelerators and therefore avoid the long development times of accelerators.

Sponsor

Prof. Wentzlaff