panda.institute
The Vast Landscape of Memory Bandwidth: From FPM DRAM to CPU Cache

The Vast Landscape of Memory Bandwidth: From FPM DRAM to CPU Cache

By OzyMarch 6, 20269 min read
hardwarememorycpu-architecturedeep-dive

Memory bandwidth spans an almost incomprehensible range - from 0.17 GB/s on legacy FPM DRAM to over 10,000 GB/s inside a modern CPU's L1 cache. This report provides a comprehensive tour through every major tier of the memory hierarchy as of March 2026, covering legacy pre-DDR memory, DDR1-DDR5 in consumer multi-channel configurations, Apple's unified memory architecture across all M1-M5 variants, the humble Raspberry Pi, server-class memory platforms, NVIDIA's AI GPU lineup, HBM's history from AMD Fiji to HBM4, consumer GPU memory, and finally the humbling speed of on-die CPU caches.


Pre-DDR Legacy Memory

Before DDR SDRAM arrived, PC memory evolved through several generations, each roughly doubling the bandwidth of its predecessor.

Memory TypeBus SpeedBus WidthPeak Bandwidth
FPM DRAM~22 MHz64-bit~0.17 GB/s
EDO DRAM~33 MHz64-bit~0.27 GB/s
PC66 SDRAM66 MHz64-bit~0.53 GB/s
PC100 SDRAM100 MHz64-bit~0.80 GB/s
PC133 SDRAM133 MHz64-bit~1.07 GB/s
RDRAM (dual-ch)400 MHz2x16-bit~3.2 GB/s

FPM (Fast Page Mode) RAM could cycle at 15-25 MHz in real systems, while EDO DRAM pushed up to 33-50 MHz. SDRAM was the breakthrough that synchronized memory with the system bus, running at 66-133 MHz and nearly doubling EDO's performance. Intel briefly pushed RDRAM (Rambus) for Pentium 4, which achieved higher bandwidth through a narrow but fast serial interface, but its high cost and latency kept it from widespread adoption.1234


DDR1-DDR5: Single, Dual, and Quad Channel

DDR Memory Evolution
DDR Memory Evolution

The DDR (Double Data Rate) family has dominated system memory for over two decades. Each generation roughly doubles per-pin data rates. The key insight for workstations is that multi-channel configurations multiply bandwidth linearly.5

Per-Channel Theoretical Bandwidth

GenerationCommon SpeedMT/s1-Ch (GB/s)2-Ch (GB/s)4-Ch (GB/s)
DDR1DDR-4004003.26.412.8
DDR2DDR2-8008006.412.825.6
DDR3DDR3-16001,60012.825.651.2
DDR3DDR3-18661,86614.929.859.6
DDR4DDR4-21332,13317.034.068.0
DDR4DDR4-32003,20025.651.2102.4
DDR5DDR5-48004,80038.476.8153.6
DDR5DDR5-56005,60044.889.6179.2
DDR5DDR5-64006,40051.2102.4204.8

All values are theoretical peaks calculated as: Bus Width (64 bits = 8 bytes) x Transfer Rate (MT/s) x Number of Channels. A mainstream desktop with dual-channel DDR5-5600 achieves up to ~89.6 GB/s - the theoretical max for Intel's i9-14900K. Workstation platforms using quad-channel DDR5 (e.g., Intel HEDT, Threadripper) can push past 200 GB/s. Real-world tests on DDR5 quad-channel showed approximately 62.7 GB/s read bandwidth vs. 31.6 GB/s in dual channel - nearly a perfect 2x scaling.678


Apple Silicon: Why Unified Memory is King (M1-M5)

Apple Silicon Memory Architecture
Apple Silicon Memory Architecture

Apple's unified memory architecture is the single best argument for why APU-style designs are the future. By packaging LPDDR memory directly onto the SoC, Apple eliminates the bus bottleneck between CPU, GPU, and Neural Engine - every component shares the same high-bandwidth memory pool with zero-copy overhead.

Complete Apple Silicon Memory Bandwidth Table

ChipYearMax RAMBandwidth (GB/s)
M1202016 GB68.25
M1 Pro202132 GB200
M1 Max202164 GB400
M1 Ultra2022128 GB800
M2202224 GB100
M2 Pro202332 GB200
M2 Max202396 GB400
M2 Ultra2023192 GB800
M3202324 GB100
M3 Pro202336 GB150
M3 Max (40-core GPU)2023128 GB400
M3 Ultra2024192 GB819
M4202432 GB120
M4 Pro202464 GB273
M4 Max (40-core GPU)2024128 GB546
M4 Ultra (projected)-256 GB~1,092
M5202532 GB154
M5 Pro202664 GB307
M5 Max (40-core GPU)2026128 GB614
M5 Ultra (projected)-256 GB~1,228

The base M5 uses LPDDR5X at 9600 MT/s, delivering 153.6 GB/s - a nearly 30% increase over M4. The M5 Pro doubles this to 307 GB/s, while the top M5 Max hits 614 GB/s. The projected M5 Ultra (two M5 Max dies via UltraFusion) would deliver approximately 1,228 GB/s - exceeding many server-class memory configurations.9101112131415

Why This Matters for AI/ML

The M4 Max's 546 GB/s already provides "4x the bandwidth of the latest AI PC chip". For local LLM inference, memory bandwidth is the primary bottleneck. An M5 Max with 128 GB at 614 GB/s can serve a ~70B parameter model at usable token rates - something no discrete-GPU consumer system with DDR5 main memory can match, because a discrete GPU must copy data across PCIe. Unified memory eliminates this copy entirely.1617


Raspberry Pi 1-5: The Humble End of the Spectrum

The Raspberry Pi family illustrates how single-board computers have evolved from barely usable memory bandwidth to respectable performance.

ModelYearRAM TypeBus WidthSpeedTheoretical BW
Pi 12012512 MB LPDDR232-bit~400 MHz~3.2 GB/s
Pi 220151 GB LPDDR232-bit~400 MHz~3.2 GB/s
Pi 320161 GB LPDDR232-bit~450 MHz~3.6 GB/s
Pi 420191-8 GB LPDDR4-320032-bit3200 MT/s~12.8 GB/s
Pi 520231-16 GB LPDDR4X-426732-bit4267 MT/s~17 GB/s

The Pi 1 used the BCM2835 SoC with a single ARM11 core and LPDDR2 clocked at 400 MHz. Real-world stream bandwidth was under 0.2 GB/s due to the weak memory controller. The Pi 4 jumped to LPDDR4-3200, and the Pi 5's BCM2712 documentation confirms "up to 17 GB/s of memory bandwidth" from its LPDDR4X-4267 interface. That's a 5x increase from Pi 1 to Pi 5, but the 32-bit bus width remains the fundamental bottleneck - the Pi 5's 17 GB/s is still less than a single DDR4-2133 DIMM on a desktop.718192021


Server Memory Architectures

Server platforms achieve massive aggregate bandwidth by scaling memory channels far beyond consumer platforms.

PlatformChannels/SocketMemory TypeSpeedBW/Socket (GB/s)
Intel Xeon Ice Lake (3rd Gen)8DDR4-32003200 MT/s~204.8
Intel Xeon Sapphire Rapids (4th Gen)8DDR5-48004800 MT/s~307.2
Intel Xeon Emerald Rapids (5th Gen)8DDR5-56005600 MT/s~358.4
AMD EPYC Genoa (9004)12DDR5-48004800 MT/s~460.8
AMD EPYC Turin (9005)12DDR5-64006400 MT/s~614.4
Intel Xeon Max (w/ HBM)8+HBMDDR5 + HBM2e-~1,638 (HBM)

AMD's EPYC 9005 series (Turin) with 12 DDR5 channels at 6400 MT/s delivers 614.4 GB/s per socket - a 30%+ improvement over the prior 9004 generation at DDR5-4800. In dual-socket configurations, aggregate bandwidth exceeds 1.2 TB/s. Intel's Sapphire Rapids with 8 channels at DDR5-4800 delivers 307.2 GB/s per socket, while the Xeon Max series adds on-package HBM2e for up to 1,638 GB/s of HBM bandwidth per socket. AMD EPYC supports up to 12 memory channels per socket vs. Intel Xeon's 8, giving AMD a structural bandwidth advantage.222324252627


NVIDIA AI Accelerator GPUs

NVIDIA AI GPU Lineup
NVIDIA AI GPU Lineup

The GPUs powering AI training and inference at companies like OpenAI, Google, Meta, and xAI represent the bleeding edge of memory bandwidth, all enabled by HBM (High Bandwidth Memory).

GPUYearArchitectureMemoryCapacityBandwidth
V1002017VoltaHBM216/32 GB~900 GB/s
A100 SXM2020AmpereHBM2e80 GB2,039 GB/s
H100 SXM2022HopperHBM380 GB3,350 GB/s
H200 SXM2023HopperHBM3e141 GB4,800 GB/s
B2002024BlackwellHBM3e192 GB8,000 GB/s
AMD MI300X2023CDNA 3HBM3192 GB5,300 GB/s

The progression is staggering: from the V100's ~900 GB/s to the B200's 8,000 GB/s in just seven years - a nearly 9x increase. The H200 provides 43% more bandwidth than the H100 (4.8 vs. 3.35 TB/s) by switching from HBM3 to HBM3e. The B200 doubles the H100's capacity and achieves 2.4x its bandwidth. AMD's MI300X competes with 192 GB of HBM3 at 5.3 TB/s.2829303132

Consumer/Prosumer NVIDIA GPUs

GPUMemoryCapacityBandwidth
RTX 3090GDDR6X24 GB936 GB/s
RTX 4080GDDR6X16 GB717 GB/s
RTX 4090GDDR6X24 GB1,008 GB/s
RTX 5090GDDR732 GB1,792 GB/s

The RTX 5090's jump to GDDR7 with a 512-bit bus delivers 1,792 GB/s - a 77% increase over the RTX 4090. For local AI workloads, the RTX 3090's 24 GB with 936 GB/s remains popular in the r/LocalLLaMA community due to its VRAM capacity advantage over the 4080.33343536


HBM: From AMD's Fiji to HBM4

HBM Evolution
HBM Evolution

High Bandwidth Memory was born from a collaboration between AMD, Samsung, and SK Hynix, with JEDEC standardizing HBM1 in October 2013. The first production HBM chip came from SK Hynix in 2013, and AMD's Fiji GPU (Radeon R9 Fury X, 2015) was the first device to ship with HBM.37

HBM Generation Specifications (Per Stack)

GenerationYearData Rate/PinInterfaceMax CapacityMax Bandwidth
HBM120131.0 Gb/s8x128-bit4 GB128 GB/s
HBM220162.4 Gb/s8x128-bit8 GB307 GB/s
HBM2E20193.6 Gb/s8x128-bit24 GB461 GB/s
HBM320226.4 Gb/s16x64-bit24 GB819 GB/s
HBM3E20239.8 Gb/s16x64-bit48 GB1,229 GB/s
HBM420258.0 Gb/s32x64-bit64 GB2,048 GB/s

HBM achieves its extraordinary bandwidth through a massively wide 1024-bit interface (per stack) combined with 3D die stacking via through-silicon vias (TSVs). AMD's Fiji used HBM1 with ~512 GB/s across 4 stacks. The Vega GPU moved to HBM2, with the Vega 56 achieving 409.6 GB/s across 2 stacks. Samsung's HBM2E "Flashbolt" pushed a single package to 410 GB/s with 3.2 Gbps per pin. HBM3 introduced 16 pseudo-channels and pushed data rates to 6.4 Gb/s, hitting 819 GB/s per stack. The upcoming HBM4 doubles the interface to 2048 bits and supports up to 2 TB/s per stack.383937


The Humbling Finale: CPU Cache Speeds

After reviewing everything from 0.17 GB/s FPM DRAM to 8 TB/s B200 HBM bandwidth, CPU caches put it all in perspective. The fastest memory in any system isn't external at all - it's the on-die SRAM caches sitting millimeters from the execution units.

AMD Ryzen 9 9950X (Zen 5)

Cache LevelSizePer-Core BWAll-Core Aggregate
L1 Data48 KB/core (768 KB total)~650 GB/s>10,000 GB/s
L21 MB/core (16 MB total)~300+ GB/s-
L332 MB/CCD (64 MB total)-~1,400 GB/s (per CCD)
DRAM (DDR5-6000)--~96 GB/s (theoretical)

With all 16 cores loaded, the Ryzen 9 9950X delivers over 10 TB/s of L1 data cache bandwidth. That's more than the NVIDIA B200's 8 TB/s HBM3e - except it's happening inside a $600 desktop CPU. Zen 5 achieves this with dual 512-bit vector load paths per core, a 50% increase in L1D capacity over Zen 4, and an improved 32-byte-per-cycle L3 interface. Per-CCD L3 bandwidth of ~1.4 TB/s is itself nearly as fast as the Xeon Max's total HBM bandwidth.40

Intel Core i9-14900K (Raptor Lake)

Cache LevelSize
L1 Data (P-core)48 KB x 8 cores
L1 Data (E-core)32 KB x 16 cores
L2 (P-core)2 MB x 8 cores
L2 (E-core cluster)4 MB x 4 clusters
L336 MB shared
DRAM (DDR5-5600)89.6 GB/s max8

Intel's Raptor Lake's all-core L1 cache bandwidth falls significantly behind Zen 5's 10 TB/s figure. The hybrid architecture with 16 Gracemont E-cores delivers less cache bandwidth per core, and Intel disabled AVX-512 in consumer parts - removing the 2x512-bit load path that Golden Cove originally supported. The P-cores still pack 2 MB of L2 each with ~12-cycle latency, and the 36 MB shared L3 provides solid hit rates for gaming workloads.404142

The Cache Bandwidth Perspective

Cache Bandwidth Hierarchy
Cache Bandwidth Hierarchy

Bandwidth Comparison Tool

DDR2-800 (2-ch)12.8 GB/s
B200 HBM3e8,000 GB/s
B200 HBM3e is 625.0x faster than DDR2-800 (2-ch)

To truly appreciate the hierarchy:

TierExampleBandwidth
Legacy DRAMFPM DRAM (1995)0.17 GB/s
Modern Desktop RAMDDR5-5600 Dual-Ch89.6 GB/s
Apple M5 MaxUnified LPDDR5X614 GB/s
Server (EPYC Turin)12-ch DDR5-6400614 GB/s
AI GPU (B200)192 GB HBM3e8,000 GB/s
CPU L3 Cache (Zen 5)32 MB SRAM/CCD~1,400 GB/s
CPU L1 Cache (Zen 5)48 KB SRAM/core>10,000 GB/s

The L1 cache is over 58,000x faster than FPM DRAM and roughly 111x faster than the DDR5-5600 dual-channel system memory feeding the same CPU. This is why cache hit rates matter so much - every miss that falls through to main memory encounters a 100x bandwidth penalty and an even worse latency penalty (from ~1 ns L1 to 70+ ns DRAM).40


Why APU Unified Memory Wins

The data tells a clear story about why unified memory architectures are increasingly dominant for AI and creative workloads:

  • No copy overhead: Discrete GPUs must shuttle data across PCIe (64 GB/s Gen 5). Apple's unified memory lets CPU, GPU, and Neural Engine all access the same pool at 614 GB/s on M5 Max.10
  • Bandwidth density: An M5 Max delivers 614 GB/s in a laptop form factor. Matching that with desktop DDR5 requires a quad-channel workstation platform.
  • Capacity advantage: The M5 Max supports 128 GB of memory accessible by the GPU - vs. consumer GPUs that top out at 24-32 GB of VRAM.1133
  • Power efficiency: LPDDR5X on-package draws far less power than GDDR or HBM solutions, enabling this bandwidth in a ~30W laptop chip envelope.

The tradeoff is clear: HBM-equipped data center GPUs still dominate in raw bandwidth (8 TB/s on B200), but unified memory designs offer the best bandwidth-per-watt and eliminate the PCIe bottleneck for heterogeneous workloads where CPU and GPU share data constantly.


References

Footnotes

  1. SDRAMs Ready to Enter PC Mainstream - CECS

  2. Types of RAM: How to Identify and their Specifications - Technibble

  3. Overview of RAM Types and Generations - Scribd

  4. Early Server RAM Types: DRAM, EDO DRAM, and SDRAM

  5. Multi-channel memory architecture - Wikipedia

  6. DDR5 dual-channel vs quad-channel benchmarks - Reddit

  7. Guide DDR DDR2 DDR3 DDR4 and DDR5 Bandwidth by Generation - ServeTheHome 2

  8. Intel Core i9 processor 14900K Specifications 2

  9. Apple Debuts M5 Pro and M5 Max Chips - MacRumors

  10. Apple introduces MacBook Pro with all-new M5 Pro and M5 Max 2

  11. New MacBook Pro M5 Pro and M5 Max announced - Tom's Guide 2

  12. Apple debuts M5 Pro and M5 Max - Apple Newsroom

  13. Apple M5 - Wikipedia

  14. Apple unleashes M5 - Apple Newsroom

  15. M5 Max chip is released - Reddit

  16. Apple introduces M4 Pro and M4 Max - Apple Newsroom

  17. The M4 Max goes up to 128GB RAM - Hacker News

  18. Raspberry Pi 5 vs Raspberry Pi 4: Detailed Comparison

  19. Raspberry Pi - Wikipedia

  20. Raspberry Pi 1 model B revision 2 @700MHz - Geekbench

  21. Processors - Raspberry Pi Documentation

  22. Dell PowerEdge DDR5 Memory Bandwidth for 5th Gen AMD EPYC

  23. Intel Xeon vs AMD EPYC - Hostrunway

  24. AMD EPYC Processors

  25. EPYC/Threadripper CCD Memory Bandwidth Scaling - Reddit

  26. Bandwidth Limits in the Intel Xeon Max (PDF)

  27. Intel Launches Sapphire Rapids - Tom's Hardware

  28. Comparing NVIDIA's B200 and H100 - Civo

  29. B200 Vs H200, B200 Vs H100, B200 Vs A100 - AceCloud

  30. NVIDIA H100 vs H200 vs B200 Comparison - Introl

  31. NVIDIA B200 GPU Guide - Clarifai

  32. NVIDIA Hopper Architecture In-Depth - NVIDIA Blog

  33. Nvidia RTX 5090 vs RTX 4090 - TechRadar 2

  34. NVIDIA RTX 4090 vs. RTX 5090 - vast.ai

  35. NVIDIA GeForce RTX 3090 vs 4080 Super for AI - BestGPUsForAI

  36. RTX 3090 vs RTX 4080 Super - Reddit

  37. High Bandwidth Memory - Wikipedia 2

  38. What is HBM? Deep Dive into Architecture - Wevolver

  39. HBM2E: The E Stands for Evolutionary - Semiconductor Engineering

  40. AMD's Ryzen 9950X: Zen 5 on Desktop - Chips and Cheese 2 3

  41. A Preview of Raptor Lake's Improved L2 Caches - Chips and Cheese

  42. Raptor Lake - Wikipedia

Editorial Notes