Will GPU prices come back down in 2026?

The timeline is uncertain. The DRAM shortage is driven by AI datacenter commitments to memory years in advance. If those commitments clear as expected (estimated late 2026–early 2027), consumer prices should normalize. If they do not, prices could stay elevated through 2027. There is no firm date.

Is it different from the 2021–22 crypto boom?

Yes. Crypto drove GPU scarcity through retail demand frenzy and speculators hoarding cards. DRAM shortage drives it through supply-chain constraints at the material level—fewer DRAM modules are manufactured, which tightens GPU supply across the board. The cause is different, the timeline is different, and the recovery path is different.

Should I buy now or wait?

Depends on your constraint. If you need a GPU now and a 70B model exceeds your timeline, the used market offers better value than new retail. If you can wait 6+ months and have modest VRAM needs, waiting for DRAM to clear may save 15–25%. If you rent GPU time instead, prices are less volatile and you avoid the buy decision entirely.

Are RTX 40-series cards cheaper than 50-series?

Sometimes. RTX 40-series cards (3090, 4090) are sold new at a discount to RTX 50-series retail, but the used market often undercuts both. Prices vary weekly; check current listings against your exact workload and timeline.

Why did Apple's M5 Ultra get delayed?

Apple cited DRAM supply constraints for the M5 Ultra announcement delay (June 2026). The same DRAM shortage affecting GPU manufacturers is hitting Apple's unified-memory SKUs. This is cross-category evidence that the bottleneck is genuine material scarcity, not just GPU-market manipulation.

Why GPU Prices Spiked Again in 2026: The DRAM Shortage, Explained

GPU prices are broken again in 2026, and if you are looking to buy, the anger is warranted. A new RTX 5090 lists at ~$5,000 USD, used 3090 24GB cards have crept back to ~$900–$1,200, and entry-level cards still do not deliver much value. The frustration is real. But the reason matters, because understanding it changes what you should actually do about it.

This is not 2021–22. The crypto boom was retail-driven scarcity: miners and speculators competed with gamers for a fixed card supply, hoarding GPUs and creating secondary-market frenzies. This spike is supply-chain driven: AI datacenters are locking up DRAM (the memory chips in all GPUs) years in advance, which shrinks consumer availability at the material level. The difference is not academic — it changes the timeline to recovery, the value of waiting, and whether you should rent instead of buy.

What actually happened: the DRAM supply squeeze

Start here: GPUs contain DRAM chips. More cards need more DRAM. If DRAM supply is constrained, GPU supply is constrained, and prices rise. That is the 2026 story.

According to industry reporting (TechSpot, Q2 2026, and corroborated by community threads on r/LocalLLaMA), DRAM manufacturers are running at near capacity globally. The primary buyer is not consumers — it is AI datacenters. Large cloud providers and enterprise AI infrastructure teams are pre-purchasing DRAM production years in advance to secure capacity for the next generation of accelerator cards and server memory. They are not buying individual RTX 5090s off retail; they are negotiating multi-year supply contracts at the DRAM-chip level with Samsung, SK Hynix, and Micron. This hoarding of future capacity is legal (it is standard supply-chain risk management) and predictable (it is what any major infrastructure operator would do), but it leaves consumer GPU manufacturers with less DRAM to work with.

NVIDIA, AMD, Intel, and others must then compete for the remaining supply or bid up the price per chip. That cost flows downstream: higher material cost → higher manufacturing cost → higher retail price. And because every GPU category (consumer, enterprise, datacenter) uses the same or similar DRAM chips, the squeeze hits everyone.

The same squeeze is visible in other categories. Apple’s RAM prices for MacBook Pro SKUs climbed ~$200–$400 in the past six months. Apple’s M5 Ultra announcement was delayed from May to June 2026 — and the company cited DRAM supply constraints as the reason. This is not LocalRig speculation; Apple said it publicly. That cross-category signal (RAM prices up, M5 launch delayed, GPU prices up) is strong evidence that the bottleneck is real material scarcity, not just GPU-market manipulation or crypto relapse.

Why this is not like 2021–22

The 2021–22 crypto boom was a retail demand shock. Miners and speculators bid up prices because they expected GPU values to stay high forever. They were wrong about forever, and when Ethereum went proof-of-stake in September 2022, the demand evaporated overnight. Price crashed. Used market flooded. Everyone who paid $800 for a 3090 in March 2022 could have bought one for $500 by early 2023.

That was a demand problem with a demand recovery: when hype dies, price falls. The DRAM squeeze is a supply problem with a supply recovery: when new DRAM fabs ramp up or datacenter commitments clear, supply increases and price falls.

Here is why that matters for your buy-now-or-wait decision:

Dimension	Crypto boom 2021–22	DRAM shortage 2026
Root cause	Retail demand shock + speculation	Supply constraint at material level
Who is buying	Miners + speculators	AI datacenters + enterprises
Timeline to clear	Months (driven by sentiment)	12–18+ months (driven by fab capacity)
Risk if you buy now	High (price can crash when hype ends)	Moderate (supply tightens over time, gradual recovery)
Opportunity if you wait	Very high (prices fall 30–50% within 6 months)	Moderate (prices may fall 15–25% in 6–12 months)
Alternative escape hatches	Rent (more expensive during boom)	Rent (pricing less volatile than retail)

The honest truth: no one knows exactly when DRAM clears. Industry commentary (via TechSpot and semiconductor tracking firms) points to late 2026 or early 2027 as the window when datacenter pre-purchasing slows and consumer supply normalizes. But that is not a guarantee. If AI demand stays as high as it is, if new fabs do not ramp fast enough, or if geopolitical supply-chain disruptions hit further, the timeline stretches. You are making a buy decision under uncertainty.

Your options, ranked by constraint

There is no single right answer. These are honest paths ranked by what you are optimizing for.

If you need a GPU this month: used market or AMD

New RTX 50-series retail pricing is inflated and sticky (it will not drop until DRAM clears). If you need something now, you have two real options:

Used RTX 3090 or 4090. The secondhand market sits above 2024–25 pricing but below new 50-series retail. A used 24GB 3090 is running ~$900–$1,200 on eBay (observed 2026-06-29). That is higher than the ~$500–$800 you could find in late 2025, but it is cheaper than a new RTX 5090 at $5,000, and the 3090 is community-reported to decode a 7B model at ~80–110 tok/s (not independently verified by LocalRig). If you buy used, the honest caveats apply: no warranty, check seller feedback, budget for repaste, and ask about mining history. More on that in the used GPU buying guide.

AMD Radeon (RX 7900 XTX or RX 7600 XT). AMD uses the same DRAM chips, so AMD cards face the same supply squeeze. But the secondary VRAM market is thinner for AMD, and AMD holds less datacenter demand than NVIDIA. That means AMD cards sometimes stay in stock longer and price tighter. The trade-off: CUDA is the local-LLM runtime standard (llama.cpp, Ollama, vLLM all optimize for NVIDIA). AMD support exists (via HIP), but it is less mature and your software stack is narrower. If you need a GPU now and you are comfortable with that software risk, an AMD card can be cheaper than NVIDIA. If you are still learning local LLMs, NVIDIA remains the safer default.

For more on the constraint logic, see the local-AI hardware buying framework.

If you can wait 6 months: watch DRAM pricing and decide in Q4 2026

This is the honest wait-and-see path. DRAM pricing tracks publicly (SK Hynix, Micron quarterly earnings, and trade publications report contract prices). If DRAM prices fall 20%+ between now and November 2026, GPU retail prices should follow within 1–2 quarters. If they do not, the timeline extends. Set a calendar reminder for mid-Q4 and check the market then. You will have better data.

The upside: waiting 6 months could save you $500–$1,000 on a new card. The downside: it is not certain, and you do not have the card now. If your use case is urgent (you are running production inference, fine-tuning models on a deadline, or building a business), waiting costs you opportunity. Weigh that against the price delta.

If you want to avoid the buy decision entirely: rent GPU time

Cloud GPU rental dodges the price volatility entirely. Services like RunPod, Vast.ai, Lambda Cloud, and others price GPU time on a per-hour basis. You do not own the card, you do not manage the hardware, and you do not have to guess the timeline to price recovery.

The economics: a used RTX 3090 costs $900–$1,200 today. If you rent instead, you pay ~$0.40–$0.50 USD per hour for 3090-class hardware. That is breakeven at ~2,000–3,000 hours, or roughly 3–6 months of continuous use. If your usage is episodic (a few hours per week for chat, document work, or light inference), renting is cheaper than buying. If it is heavy daily use (8+ hours per day, 5+ days per week), buying used still edges renting on cost, but the gap narrows with DRAM constraints.

For the full rent-vs-buy framework, see the rent-vs-buy break-even calculator. The prices in this article are from 2026-06-29 and will shift as DRAM clears or remains tight.

If you can wait and you are running large models: consider Apple Silicon

The M5 Ultra announcement delay is frustrating, but once it clears DRAM constraints (estimated Q3–Q4 2026), the Apple Silicon path sidesteps the DRAM shortage entirely. A Mac with large unified memory (M5 Pro with 128GB+, M5 Max with 96GB+, or the M5 Ultra when it lands) gives you 48+ GB of unified memory and bandwidthto run 32B–70B models locally. You do not buy a GPU in a box; you buy a Mac. The economics are different and often higher upfront, but for large-model work and for people already in the Apple ecosystem, it is worth the comparison.

See the M5 Ultra wait-or-buy analysis for the full trade-off.

If you are still unsure: use the RTX 5090 decision as a forcing function

The RTX 5090 lists at ~$5,000 USD retail. The RTX 4090 lists at ~$1,600–$2,000 (new). A used 3090 is ~$900–$1,200. For inference on a single 7B–13B model (the most common local-LLM workload), the 5090 does not deliver 3–5× faster throughput than a 3090. It is faster, but not linearly faster — decode is memory-bandwidth-bound, and bandwidth does not scale 3–5× with the price jump.

So here is the question: Is the RTX 5090 worth $3,800 more than a used 3090? If no, wait and use a 3090 (or rent). If yes (e.g., you need RTX 5090 performance for multi-GPU tensor parallelism or you are training models, not inferencing), then new hardware makes sense. If you are unsure, you probably do not need the 5090. See the RTX 5090 worth-it analysis for the deeper breakdown.

The honest bottom line

GPU prices are too high right now, and the reason is real: DRAM shortage driven by datacenter pre-purchasing. It is not a scam, not crypto relapsing, and not purely NVIDIA price-gouging — though retail pricing does reflect some margin expansion. The shortage is global and affects RAM prices, GPU prices, and Apple Silicon timelines all at once.

Do you need a GPU now? Buy used (RTX 3090) or consider renting. The used market offers better value than new retail, and renting avoids the ownership headache.

Can you wait 6 months? Watch DRAM spot prices in Q3 and decide in Q4 whether recovery is visible. If so, prices should stabilize or drop within 1–2 quarters.

Is your timeline longer than 12 months? Be patient. Supply eventually clears, and prices normalize. The recovery may come in late 2026 or mid-2027, but it will come.

Are you running large models or training? Apple Silicon is worth waiting for (M5 Ultra, once DRAM clears) or renting cloud hardware now. The rent-vs-buy math is tight, but renting removes uncertainty.

No manufactured urgency. No “prices will never be this low again.” The truth is simpler: DRAM is tight, GPU supply is tight, and retail prices reflect that. Buy when it makes sense for your workload and timeline. Wait when waiting serves you. Rent when it is cheaper than buying. The choice depends on what you are actually trying to do, not on what the market is doing.

Prices and DRAM commentary reflect market conditions as of 2026-06-29. DRAM spot pricing moves weekly and contract timelines are forward-looking; verify current market data and your local retail/secondhand listings before committing to a purchase.