Homelab & Platform

Dual RTX 3090 Build Guide 2026: 48GB of VRAM for the Price of One New Card

Two used RTX 3090s, bought at today’s street prices, land you 48GB of VRAM for roughly what one new flagship card costs — but only if you get the board, PSU, and airflow decisions right first. This guide is the parts list and the reasoning behind each line, including the part vendors don’t advertise: dual-GPU setup is genuinely more finicky than single-card, and you should know that cost before you buy.

This is the build-guide layer that sits underneath Two RTX 3090s vs. One RTX 4090, which makes the capacity-vs-speed case. If you haven’t read that comparison yet, read it first — it explains why you’d choose this path. This page assumes you’ve already decided VRAM capacity is your constraint and walks through building the machine.

Why 48GB instead of one bigger card?

Because 24GB is a hard ceiling on any single consumer GPU, and the models that need more than 24GB — large 32B models at higher quantization, or 70B-class models — simply will not load on one card no matter how fast that card is. Two 24GB cards give you 48GB of addressable VRAM, which is the difference between “this model fits” and “this model doesn’t.” For the specific gigabyte math on 70B models at various quantization levels, see hardware to run a 70B model locally before you finalize your quant target — it determines whether 48GB is comfortable or tight.

The honest caveat, repeated here because it matters for budgeting expectations, not because this guide is the place to relitigate it: two cards buy you capacity, not double the tokens per second. Inference is memory-bandwidth-bound, and without NVLink-accelerated tensor parallelism, the cards coordinate over PCIe, which is far slower than on-card bandwidth. You are building this rig to run a model that would not otherwise run at all — not to run a small model twice as fast.

The parts list

Every line item below states why it’s there and what breaks if you cheap out on it.

ComponentWhat to buyWhy it’s thereWhat breaks if you cheap out
GPUs (×2)Used RTX 3090 24GB, ~$600–$800 each (eBay, observed 2026-06-29)24GB × 2 = 48GB VRAM at the best used bandwidth-per-dollar on the marketA cheaper 12GB card doubled is still only 24GB combined and won’t hit the same bandwidth
MotherboardX670/TRX50-class board with two full-length PCIe x16 slots, confirmed x8/x8 (or better) electrical, ≥3-slot physical spacingFeeds both cards enough PCIe bandwidth and leaves room for triple-slot coolersA board that drops the second slot to x4 electrical, or spaces slots too close, throttles transfer and cooks the top card
CPUMid-tier consumer CPU with enough PCIe lanes to split x8/x8 (Ryzen 7000/9000 or equivalent)Determines whether both slots can run at x8 simultaneouslyBudget CPUs with too few lanes force one card down to x4, which matters more for load time and layer-split transfer than raw inference
PSU1000W+ 80+ Gold (guide-author estimate, flag: not independently lab-tested by LocalRig)Two 350W TDP cards plus CPU, board, and drives — 1000W is the community-cited floor, not a comfortable ceilingUndersized PSUs cause random shutdowns under sustained load, which is exactly what long inference/training runs are
CaseFull tower with ≥3 PCIe expansion slots of physical clearance and strong front-to-back airflowTwo 350W cards side by side need real airflow, not a mid-tower squeezeTight cases trap heat between the cards; the bottom card throttles first and silently
NVLink bridge (optional)Used RTX 3090 NVLink bridge, ~$40–$80 (observed 2026-06-29)Links the two cards’ memory directly for certain tensor-parallel/training workloadsSkipping it costs nothing for most inference setups — see the honest framing below
Riser cable (situational)PCIe 4.0-rated riser, only if case layout requires itLets you physically fit both cards when slot spacing is tighter than card thicknessA cheap or wrong-generation riser introduces instability under sustained load — buy a rated one or skip it entirely
StorageNVMe SSD, 1TB+Model weights are large; you’ll be swapping quantizations and checkpointsSlow storage doesn’t hurt inference speed but makes every model swap painful

Browse used RTX 3090 24GB on eBay → · Shop 1000W+ PSUs on Amazon → · Full-tower cases on Amazon →

Motherboard and PCIe lanes: the decision that gets skipped

The question here isn’t “does it have two PCIe slots” — nearly every ATX board does. The question is whether both slots still run at meaningful bandwidth when both are populated, and whether they’re spaced far enough apart that two triple-slot cards don’t suffocate each other.

Check the board manual’s slot diagram, not the marketing bullet points, for two things: the electrical lane split when both x16-length slots are populated (x8/x8 is fine for this workload; x16/x4 is not), and the physical distance between slots in millimeters versus your cards’ actual thickness. A card advertised as “2.5-slot” is sometimes closer to 3 in practice. If the second card’s intake fan is pressed against the first card’s exhaust, thermal throttling starts long before either card hits its rated limit. For the deeper motherboard-and-lane-count reasoning, see PSU for multi-GPU AI rigs, which covers this alongside the power math.

Compatible dual-slot motherboards on Amazon →

How big a power supply do you actually need?

Plan for 1000W minimum, 80+ Gold or better, as your floor. This figure is a guide-author estimate built from community-cited dual-3090 build threads (r/LocalLLaMA, 2024–2025), not an independently lab-measured number from LocalRig — flag it as such and size up if your CPU is also power-hungry or you’re adding an NVLink bridge and extra drives.

The arithmetic behind it: each RTX 3090 carries a 350W TDP (NVIDIA spec), so two cards under sustained load can pull 700W from the GPUs alone, before the CPU, board, fans, and drives. A 1000W unit gives you headroom above that combined draw rather than running at the ragged edge of the PSU’s rated capacity, which is where efficiency drops and instability under load rises. Don’t undersize this component — a marginal PSU is the single most common cause of mysterious dual-GPU crashes, and it’s the hardest fault to diagnose after the fact because it looks like a driver or software problem.

Shop 1000W+ 80+ Gold PSUs on Amazon →

Skip it for most inference setups. NVLink ($40–$80 used, observed 2026-06-29) creates a dedicated high-bandwidth link between the two cards’ memory, separate from PCIe. It matters most for training and certain tensor-parallel configurations where the cards need to exchange large amounts of intermediate data quickly. For the common local-LLM inference case — running llama.cpp or vLLM with a model split across two cards — most of that communication rides over PCIe regardless of whether NVLink is present, and the community-cited benefit for single-stream inference is small to negligible.

The honest framing: buy the bridge if you’re planning to do meaningful fine-tuning or training work on this rig alongside inference, where the tensor-parallel benefit is more likely to show up. If inference is the only job, the $40–$80 is better spent on a slightly better PSU or case fans.

Spacing and thermals: what actually cooks a card

Two 350W cards in one case is a genuinely different thermal problem than one. The failure mode is specific: the top card exhausts hot air directly into the intake of the card below it (or vice versa, depending on orientation), so the second card runs hotter than its own cooler would suggest — and it throttles quietly, without an obvious warning, showing up only as unexplained slower decode over a long session.

Three things prevent this: physical slot spacing wide enough that both coolers have clear intake air, front-to-back case airflow (intake fans low and front, exhaust high and rear) rather than relying on the cards to move all the air themselves, and — for a card bought used — fresh thermal paste, since a multi-year-old card with dried paste starts the thermal budget already behind. Check idle and sustained-load temps on both cards independently after first boot; if the second card runs meaningfully hotter than the first under the same load, that’s a spacing or airflow problem to fix before you blame the card.

Case fans and airflow kits on Amazon →

What dual-GPU complexity actually costs you

This is the section most build guides skip, and it’s the one that determines whether this build is right for you. Running a model across two GPUs is not plug-and-play the way a single card is.

  • Runtime setup is a real step, not a checkbox. Splitting a model across two cards requires the runtime to support it explicitly — tensor-parallel or layer-split modes in vLLM or llama.cpp — and getting the split configured correctly (how many layers per card, which card holds the KV cache) takes real trial and error the first time. Budget an evening, not five minutes. The vLLM multi-GPU setup guide walks through this configuration in detail.
  • Driver and CUDA version mismatches multiply. With one card, a driver issue is one variable. With two cards sharing a PCIe bus and both needing consistent CUDA visibility, debugging becomes “is it the model, the split configuration, the driver, or the PCIe lane allocation” — a longer diagnostic chain than single-GPU troubleshooting.
  • You lose some flexibility a single big card doesn’t cost you. A model that needs 30GB fits comfortably on a 48GB dual setup, but so would it on a single 32GB+ card if one existed at this price point (it largely doesn’t in the used consumer market) — the tradeoff is real complexity for real capacity, not a free lunch.

None of this is a reason to avoid the build. It’s a reason to go in knowing the second card buys you VRAM headroom at the cost of an afternoon of runtime configuration, not a plug-and-play doubling of a single-card setup.

Bottom line

If your model needs more than 24GB and you’ve confirmed that with the 70B hardware requirements page, a dual RTX 3090 build is the most defensible way to get there on a consumer budget: ~$1,200–$1,600 for the two cards (observed 2026-06-29) plus a board, a 1000W+ PSU, and a case with real airflow — comparable to or less than one new flagship card, with double the VRAM. Skip the NVLink bridge unless you’re training, don’t cheap out on the PSU or slot spacing, and set aside real time for runtime configuration before you expect it to just work. If your model fits in 24GB, this whole build is the wrong answer — buy the single card instead and read Two RTX 3090s vs. One RTX 4090 for that comparison.

Sources

  • r/LocalLLaMA community build threads — dual RTX 3090 PSU sizing, PCIe lane splits, NVLink utility (2024–2025)
  • LocalRig first-party benchmark: base Apple M4, 16 GB — llama.cpp b9820 (18.4 tok/s) and Ollama 0.30.11 (19.5 tok/s), Llama 3.1 8B Q4_K_M, 2026-06-27
  • NVIDIA RTX 3090 product specifications: nvidia.com (350W TDP, GDDR6X, NVLink support)
  • vLLM and llama.cpp multi-GPU documentation (tensor-parallel and layer-split modes)
  • eBay used RTX 3090 listings, observed 2026-06-29