LocalRig
Honest hardware guidance for running AI locally. Constraint-logic buying guides, first-party benchmarks, and interactive tools — no hype, no fake urgency, no ranking by commission rate.
What can I run? →Can I Run It?
All guides →-
Hardware to Run a 70B Model Locally: VRAM, the 48GB Wall, and Your Real Options
What it actually takes to run a 70B model at home: the VRAM math, why 48GB is the practical floor at Q4, and the four hardware paths (dual 3090, used A6000, Apple Silicon, or cloud).
-
Hardware to Run a 7B/8B Model Locally: RTX 3090, Apple M3 Max, and Budget Options
Benchmark-backed hardware guide for running 7B and 8B parameter models locally. Covers RTX 3090, Apple M3 Max, RTX 3060, and Apple M4 — with first-party Apple M4 benchmarks, community throughput data, VRAM requirements, and honest trade-offs.
-
The Local-AI Hardware Buying Framework
A constraint-first framework for choosing hardware to run AI models locally. Covers VRAM, memory bandwidth, quantization, Apple Silicon, and budget paths — so you buy once and regret nothing.
-
Quantization: What It Means for Local AI and Why It Matters
Quantization reduces the numerical precision of a model's weights to shrink its memory footprint — the single technique that determines whether a 7B or 70B model fits in your GPU's VRAM and how fast it will run.
GPUs
All guides →-
Used RTX 3090 Buying Guide 2026: Still the Best $/VRAM in Local AI — If You Vet It Right
The used RTX 3090 remains the consensus VRAM-per-dollar champion for local LLM inference in mid-2026, but the used market carries real betrayal risk: defective VRAM modules, undisclosed mining-farm history, and PSUs sized for gaming instead of sustained AI load. This is the vetting checklist and the honest "when not to buy one" case.
-
Best GPU for Local LLM Inference (2026): VRAM-per-Dollar Guide
The GPU decision for local LLM inference is set by VRAM (does the model fit) and memory bandwidth (how fast it decodes), not raw FLOPS. A constraint-first, VRAM-per-dollar guide: used RTX 3090 vs RTX 4090 vs RTX 3060, multi-GPU reality, and when to switch to Apple Silicon.
Apple Silicon
All guides →Homelab
All guides →Guides coming soon.
Local vs Cloud
All guides →Runtimes
All guides →-
How to Run LLMs Locally: Which Inference Engine for Your Rig (2026)
A decision guide that picks the right local inference engine from your hardware, not hype. llama.cpp for CPU and portability, MLX on Apple Silicon, vLLM for CUDA serving — and why we don't recommend Ollama.
-
How to Run llama.cpp on an RTX 3090 (CUDA, Step by Step)
A step-by-step guide to building llama.cpp with CUDA and running a GGUF model on an RTX 3090. Covers driver and toolkit prerequisites, the CUDA build, full GPU offload with -ngl, a throughput check, and an OpenAI-compatible server.