The AI Memory Bottleneck, and How It’s Being Remedied

4 min readAug 6, 2021

In 1945, a Hungarian-born mathematician named John von Neumann provided a blueprint for computer design (“von Neumann’s architecture,” as it was labelled) that had largely been followed ever since. Twenty years later, an American engineer named Gordon Moore theorized that the speed and capabilities of computers would double every two years. And that way of thinking (“Moore’s Law,” as it is known) has likewise endured.

Recent developments have, however, led to renovations of that architecture, and the rewriting of that law. The issue in a nutshell is that processing speed exceeds the speed at which memory can be accessed, which hinders overall performance. It is often described as a memory bottleneck — even, on occasion, the “von Neumann bottleneck” — and is best analogized with vehicular traffic approaching the scene of a highway accident and being forced to slow down while each motorist merges into a single lane.

The primary culprit is artificial intelligence (AI), which requires moving a voluminous amount of data. It is a technology that is already impacting a wide variety of industries, and whose growth curve was described as being akin to a “hockey stick” — i.e., a dramatic upward slant — by Ajit Manocha, president and CEO of Semiconductors Equipment and Materials International (SEMI), during a 2019 gathering of the Chinese American Semiconductor Association.

It has often been said that AI will be one of the many driving forces behind the fourth Industrial Revolution (a.k.a. Industry 4.0), but Manocha quantified it to an even greater degree during that meeting, saying that the market size for AI semiconductors, then at $4 billion, would mushroom to $70 billion by 2025. Contributing to that, he added, are the rise of such AI-dependent technologies as the Internet of Things (IoT) and 4G/5G. He cited studies showing that there would be between 500 billion and one trillion connected devices by 2030.

All these devices will produce a staggering amount of data. It is predicted that by 2025, there will be over 175 zettabytes produced around the world, over four times as many as in 2019. (One zettabyte is the equivalent of one trillion gigabytes.)

That’s a lot of traffic, a lot of data jockeying for space in the pipeline. And Moore’s Law no longer appears to apply. Steven Woo, an executive at the technology company Rambus, underscored that while speaking at the same 2019 symposium that featured Manocha. Woo pointed out that since 2012 AI’s impact had been such that computer capabilities had doubled not every two years, as Moore had originally asserted, nor even every 18 months, which was a subsequent prediction. Rather, they were doubling every 3.5 months.

That has meant remodeling von Neumann’s architecture, which features the CPU (governing calculations and the movement of information), memory (data storage and instructions) and the I/O interface (enabling memory flow to other devices).

Particularly interesting was a magnetic memory device that was developed as a result of a joint study completed in early 2020 by researchers from Northwestern University and the University of Messina, in Italy. The device is composed of antiferromagnetic materials (AFM) — specifically, platinum manganese — which retains stored data not because it requires electric current (as is typically the case), but rather because the spin of the electrons in the material results in magnetization.

The device, which according to Northwestern researchers is the smallest of its kind, can be used in concert with existing semiconductors, meaning it is scalable and “much closer to practical applications,” as lead researcher Pedram Khalili told Tech Xplore.

“This is a big deal for industry,” Khalili added, “as there is a strong demand today for technologies and materials to extend the scaling and performance of MRAM (i.e., magnetic random-access memory) and increase the return on the huge investment that industry has already made in this technology to bring it to manufacturing.”

Also in 2020, Rice University researchers introduced an architecture they labeled TIMELY (i.e., “Time-domain, In-Memory Execution, Locality”), which unlike von Neumann’s approach involves a method known as processing in-memory (PIM). As the description implies, processing is introduced to memory arrays, which greatly improves efficiency.

Previous solutions had focused on the implementation of memory systems on AI chips. One that had shown particular promise was HBM (High Bandwidth Memory), which also brought memory bandwidth into greater proximity with the GPU, enhancing the processing of AI applications.

Also gaining traction are methods like on-chip memory (though its capacity is limited) and GDDR (graphics double-data rate memory). The latter is designed specifically for graphics cards, game consoles and the like.

The bottom line is that many methods are being put to use, the search for an optimal solution to the memory bottleneck is ongoing. As Yingyan Lin, director of Rice University’s Efficient and Intelligent Computing Lab, said in a ScienceDaily report, “There are no one-for-all answers.”

But there are answers. That much is certain. And more of them will be needed in the years ahead, as data continues to explode, devices multiply and the use of AI increases. This is an ongoing process, with new solutions forever coming to light.

The AI Memory Bottleneck, and How It’s Being Remedied

Written by Don Basile