If I were an enthusiast, I would rather consider a mini PC with AMD Strix Halo APU. These things have been coming soon for a few months now.
The memory is slower but not by much, 256 GB/s is much faster than system memory found in most consumer-targeted PCs. The devices have way more memory, up to 128 GB. A system with a Strix Halo APU is a general-purpose computer; these special accelerator cards can only be used for one thing.
256 GB/s is excruciatingly slow for LLM interference. The 5090 has roughly 8x as much and since the task is mostly RAM BW bound, performance scales almost linearly with it.
There's a sweet spot for running MoE models, though. If you need the entire model in VRAM but only need to retrieve a part of it per token, trading more memory for less bandwidth can be a win.
I have a 4090, and given the MoE trend, I'd be more tempted to purchase a Strix Halo next than a 5090.
The specialized accelerators discussed in the article have much slower memory than a 5090 GPU. The memory in them delivers 448 or 512 GB/s, only around 2x compared to Strix Halo.
If I were an enthusiast, I would rather consider a mini PC with AMD Strix Halo APU. These things have been coming soon for a few months now.
The memory is slower but not by much, 256 GB/s is much faster than system memory found in most consumer-targeted PCs. The devices have way more memory, up to 128 GB. A system with a Strix Halo APU is a general-purpose computer; these special accelerator cards can only be used for one thing.
256 GB/s is excruciatingly slow for LLM interference. The 5090 has roughly 8x as much and since the task is mostly RAM BW bound, performance scales almost linearly with it.
There's a sweet spot for running MoE models, though. If you need the entire model in VRAM but only need to retrieve a part of it per token, trading more memory for less bandwidth can be a win.
I have a 4090, and given the MoE trend, I'd be more tempted to purchase a Strix Halo next than a 5090.
The specialized accelerators discussed in the article have much slower memory than a 5090 GPU. The memory in them delivers 448 or 512 GB/s, only around 2x compared to Strix Halo.
"Both Blackhole cards offer roughly half the memory bandwidth of a used RTX 3090"
And that means I have no idea what these cards could be useful for. They are more expensive, have roughly the same VRAM, but are much slower.
These have the advantage of not being 5 years old with no warranty.