In-Memory Compute with Off-the-Shelf DRAMs and Efficient On-Chip Data Supply for Heterogeneous SoCs

Date
Aug 21, 2024, 10:00 am11:30 am
Location
Equad B327

Speaker

Details

Event Description

In-memory computing has long been promised as a solution to the “Memory Wall” problem. Unfortunately, performing computations with memory resources either has relied on emerging memory technologies which are not readily available today or has required additional circuits be added to RAM (Random Access Memory) arrays. So far, the competitive and low-margin nature of the RAM industry has made commer- cial RAM manufacturers resist adding any additional logic into the existing design. In this thesis, we demonstrate methods of in-memory compute with o↵-the-shelf DRAM (Dynamic Random Access Memory) chips without any hardware modification, thus making it more realistic and ready-to-use.

We found that specially timed DRAM command sequences lead to undocumented, but also constructive and stable behaviors in DRAM arrays. We studied and charac- terized those behaviors with a customized DRAM controller and unmodified DRAM modules from major DRAM vendors. We propose a DRAM command sequence that can open multiple DRAM rows at the same time, thereby enabling bit-line charge sharing. With the charge sharing, we implement intra-subarray row copy and majority-of-three operations. Subsequently, these primitive operations are employed to develop an architecture for arbitrary, massively-parallel, bit-serial computation with o↵-the-shelf DRAM. Subsequently, additional command sequences are proposed to store fractional values in a DRAM cell. Utilizing fractional value storage, we can enable more modules to perform the in-memory majority operation, increase the stability of the existing in-memory majority operation, and build a state-of-the-art DRAM-based PUF (Physical Unclonable Function) with unmodified DRAM.

The second focus of this thesis is on managing data movement within a heteroge- neous System-on-Chip (SoC). We present mechanisms for configuring and accessing on-chip devices, and coherent data access from accelerators in the DECADES chip, which is a silicon prototype of a accelerator-rich heterogeneous many-core architecture. The physical design considerations, design choices, and testing results of the DECADES chip are described as well.

Adviser: David Wentzlaff