We were waiting to see what a different roster including SK Hynix and Synopsys would have to say on HBM in the latest Open Silicon webinar. This event focused on HBM bandwidth issues; a packaging session on 2.5D interposers was promised for a future webinar.
For their part, the SK Hynix story on HBM is the same. GDDR5X parts (from Micron) are just hitting the ground, fitting into the external memory model the industry is used to. Breaking that model with HBM2 and multi-die packaging can blow GDDR5X away in theoretical bandwidth with some attention to details.
The packaging advantage in performance, with shorter trace runs, lower capacitance, and lowered bus drive currents is fairly obvious. As the technology matures and people get comfortable with processes, test, and yields (all translating to costs), HBM2 should pull away from the pack. What isn’t so obvious is the importance of the HBM memory controller IP, and verification IP used to test designs pre-silicon.
Open Silicon’s presenter, Dhananjay Wagh, spent most of his time on the differences in HBM controller IP, going so far as to imply Open Silicon is one of the few firms who has it right in a cohesive controller + PHY + I/O offering. His talking points were around efficiency factors. Wagh suggested normal DDR data bus efficiency is something around 70 to 75% max.
Going after one of the DDR shortcomings, Open Silicon’s HBM controller IP implements a semi-independent row and column command interface, allowing simultaneous active pre-charge and write/read commands. The architecture isolates row and column operations and in effect opens multiple banks for a pool of commands. It then sorts out which commands are best sent where for latency fairness. (Apologies for the fuzzy screen shot from the webinar.)
The point of this chart is if one runs the command sequence in order, there are bank turn-arounds that kill data bus efficiency – drivers have to switch direction and overcome a settling time before data can be moved the other way. By reordering the commands, row commands are reduced and the turnaround requirement is reduced to only one.
HBM’s pseudo channel architecture is set up nicely for optimizing traffic when driven with a programmable controller. The controller can go out and aim addresses at idle banks in different channels, which finish before the next request for that bank arrives. Wagh offered two simulation examples, a video buffer with 4 channels running sequential addressing achieving 34 ns latency and 91% efficiency, and a network packet buffer running in 8 channels with random addressing latency no worse than 49 ns (in a 50/50 write/read load) and 82% efficiency.
Synopsys has extended its verification IP into the HBM realm with their 2016.06 release, allowing customized sequences to be run against the HBM controller for both functional coverage and traffic tests. The HBM VIP provides both trace files and debug port support, and testbench runs can be brought over into Synopsys Verdi for further debug.
The registration page at Open Silicon now leads to the recorded event:
HBM – Breaking Through “The Memory Wall”
Saying “HBM offers more bandwidth” begs the efficiency question. I can see how a non-robust memory controller might spend a lot of wasted time; I’d give this event a view if you have questions or concerns there. The Open Silicon HBM controller IP is available in 16FF and 55 nm today, with plans for 28 and 14 nm. We look forward to the rest of their story on HBM packaging and turnkey ASIC strategies at a later date.