A couple of months back, Open Silicon announced an HBM2 IP subsystem. Those of us who don’t deal on a daily basis with – what to call it? High intensity memory? – might be vaguely aware that some standards of some sort exist, but are used by other people. Which means we keep but a vague impression of what it’s all about: stacks of memory or some such craziness.
But there’s more to the story than that, and the announcement was a something of a kick in the butt for me to try to ferret out what was going on with these 3D chunks of memory. Because there’s not just one way of doing it – there are two major approaches, with two different goals. One is High-Bandwidth Memory (HBM) (for you static-sticians out there, no, it’s not “human body model”); the other is the Hybrid Memory Cube (HMC). And, given that the announcement was about HBM2, there would appear to be more than one generation underway. So what’s the deal here?
Both of these “standards” join a variety of other standards that are part of this evolution. I put “standards” in quotes because only the HBM approach has been standardized by an independent body, JEDEC. There’s a “standard” for HMC as well, but it’s from the HMC Consortium – not independent of the technology. This gets to the fuzziness of the word “standard” that we looked at recently.
That said, they join DDR – the basic DRAM interface – on its 4th generation; GDDR – a DDR variant dedicated to memory for graphics – on its 5th generation; and the Wide-IO standard, which establishes an interface between memories that would be impractical if done by wires, but can work wonders with a stack of memory chips using microbumps and through-silicon vias (TSVs).
Strictly from a point of view of trying to understand the family relationships between these, I’ve sketched something of a family tree below based on my interwebs ferreting. Specifically, the HBM side of things is targeted largely at improving graphics processing, while HMC is intended more for servers in data centers, to oversimplify.
Let’s start with a big high-level difference. As a graphics-oriented standard, HBM has an architecture that can include a processor (often a GPU) in the same package, providing a tightly-coupled high-speed processing unit. HMC devices, on the other hand, are intended to be connected up to a processor (or “host”) with the ability to “chain” multiple memories for higher capacity. More on those bits in a moment.
That said, they’re both 2.5D implementations, with memory stacks and logic collocated on a single interposer, although the preferred configuration for HMC is a single stack (logic die on the bottom) as a 3D solution. The HMC stack is often referred to as a “cube.”
Another difference reflects the backers and suppliers: HMC has pretty much only Micron as a supplier these days (apparently, HP and Microsoft were originally part of the deal, but they’ve backed out). HBM is a Hynix/AMD/Nvidia thing, with primary suppliers SK Hynix and Samsung.
Both have an architectural notion that separates the logic of a controller and the actual bit cells themselves. Putting the logic on a separate chip places it on a standard CMOS process, without all the special memory stuff. But HBM further splits that logic into “device” logic and “host” logic, with the difference largely being the level of abstraction. Host logic interprets commands sent by a processor; device logic largely does the work that DDR logic performs – except that it’s no longer on the same die as the memory.
Part of the logic to this is that the memory stack itself will be made by a memory company; the combination with a GPU or CPU might be done by a different company. So this separates out the logic that is more intimate with the memory from the higher-level logic that drives the memory.
The Open Silicon solution is slightly different in that they don’t include the CPU itself – just the host logic. But they’re selling IP, not a chip outright, so, presumably you add the CPU to their IP on your SoC. (Or, more likely, the other way around…)
(Image courtesy Open Silicon)
Then there’s the distinction between interfaces. HBM becomes a complete, self-sufficient unit, with memory tightly connected inside the package. This is referred to as “near memory.” It’s not intended that you combine these to get more memory.
HMC, on the other hand, supports the notion of “far memory” – chains of memory that are effectively networked together for attachment to the CPU for a maximum of 8 cubes. Instead of the DDR-based interface that the HBM stack uses, HMC has a packet-based interface that operates over some number of links (2, 4, or 8), each of which has 8 or 16 serdes lanes, each of which can have bandwidth from 10 to 30 Gbps. The largest, fastest devices boast 480 GBps, although, after accounting for overhead, this is more like 320 GBps of actual payload data.
What’s refreshing here is that, unlike so many of the IoT “standards” that compete with each other, these two don’t. There’s no reason why the success of one has to come at the expense of the other. They have different sweet spots, so, as long as both sweet spots survive, both configurations can survive.
Open Silicon has no specific horse in this race, or, perhaps better said, they have both horses in the race. The announcement was about their HBM2 subsystem, but they also have IP for HMC memory as well, providing both ASIC and FPGA versions of the HMC controller.
They’ve put together a handy chart summarizing the differences between HMC (Gen3) and HBM2, reproduced below. Yeah, we didn’t get into the differences between generations; in general, they add speed or variations on links and lanes and such. And there are a number of parameters that we haven’t covered as part of this exceedingly brief overview; further delving is left as an exercise for the reader.
(Source: Open Silicon)