GPU Passthrough on Proxmox for Local AI: IOMMU, VFIO, and the Gotchas
The integration point where homelabbers’ projects actually touch: Proxmox host, GPU sitting idle, Ollama or other inference engine running inside a VM, waiting for a graphics card that lives outside the hypervisor. Getting the GPU through the hypervisor boundary and into the container where your LLM can see it is straightforward on paper and genuinely painful in practice — not because the kernel machinery is broken, but because motherboard firmware design, BIOS defaults, and IOMMU grouping vary wildly across consumer boards, and the failure messages do not tell you why.
This guide walks you through the actual setup, names the real failure modes before you hit them, and tells you honestly when a simpler architecture (bare metal, or a different VM strategy) is the better answer than persisting with passthrough.
When to passthrough, when to give up
Before you start, ask: does the GPU actually need to be virtualized?
Use passthrough (GPU through Proxmox into a VM) when:
- Your Proxmox host runs other workloads that need the flexibility of VMs (web services, storage, dev containers).
- You want to upgrade or replace the GPU without rebuilding the host OS.
- You are learning Proxmox or plan to run multiple inference VMs and want to dynamically allocate GPUs.
- You have multiple GPUs and want to assign them to different VMs.
Bare metal is simpler when:
- This is a dedicated inference box and you have no other workloads on the host.
- You want the lowest latency and no VM overhead (single-digit percentage, but worth measuring for your model size and batch size).
- IOMMU grouping on your motherboard is a mess (see “Checking your motherboard” below).
- You have experienced passthrough failures and are out of patience.
For a single RTX 3090 running Ollama on a homelab, bare metal install of Debian + Ollama is honestly a faster ship time than debugging IOMMU and VFIO. If you have already committed to Proxmox for other reasons, push through the steps below. If Proxmox is a hypothetical, a single-purpose inference box might not need it.
The architecture: IOMMU, VFIO, and Proxmox
Three things have to happen for GPU passthrough to work:
-
IOMMU (Intel VT-d or AMD IOMMU) is enabled in the motherboard BIOS. This splits the PCIe address space into protected groups, so each device (or group of devices) can be assigned to a VM without the VM seeing devices that do not belong to it.
-
VFIO (Virtual Function I/O) kernel driver claims the GPU. Instead of the normal NVIDIA driver running on the host, the VFIO module binds to the GPU and holds it inert, ready to hand off to a VM.
-
Proxmox (or libvirt / KVM underneath it) creates a VM with PCI device passthrough configured. The VM boots, loads the normal NVIDIA driver, and sees the GPU as if it were installed in bare metal.
The catch: IOMMU groups are determined by your motherboard’s PCIe topology and BIOS settings. Some boards put every device in its own group (ideal). Many consumer boards group the GPU with other devices — a storage controller, a network card, a USB hub — because of firmware design choices or disabled ACS. When that happens, you cannot pass through the GPU alone; you have to pass the entire group, which fails unless you do not care about those other devices.
Checking your motherboard and BIOS settings
Before you touch the Proxmox host, verify that:
-
Your motherboard supports IOMMU. Check the manual for “VT-d” (Intel) or “IOMMU” (AMD) in the feature list. Most X570, B550, Z790, and newer boards support it; some B450 and Z690 boards do too, but you have to check.
-
BIOS has IOMMU/VT-d enabled. Power off the host, enter BIOS, and look for settings like:
- Intel: “Intel VT-d” or “VT for Direct I/O” (usually under Advanced → System Agent Configuration or similar).
- AMD: “IOMMU” or “AMD-Vi” (usually under Advanced → Chipset Configuration or CPU Features).
- Enable it. Do not enable “ACS” unless the manual says it is safe; many boards do not implement it correctly.
-
Your GPU is in a clean IOMMU group. After you boot Proxmox with IOMMU enabled (see next section), run:
dmesg | grep -i iommuLook for lines like
IOMMU: … group …. Then run:find /sys/kernel/iommu_groups -type l | sort -V | while read link; do echo "IOMMU Group $(basename $(dirname $link)): $(basename $link) $(lspci -nns $(readlink $link | sed 's/.*\///' | cut -d':' -f1-3))"; done | grep -i nvidiaThis shows which IOMMU group your NVIDIA GPU is in and what else is in that group.
If the GPU shares a group with a storage controller or USB device, your options narrow:
- Pass the entire group (if you do not use the other devices).
- Look for BIOS options to separate devices (some boards have “ACS override” or per-port IOMMU settings; check your manual).
- Consider bare metal, because the BIOS may not support clean separation.
Proxmox host setup: enabling IOMMU and VFIO
These steps assume Proxmox VE 8.x. If you are on 7.x, package names and paths are slightly different; check the Proxmox wiki for your version.
Step 1: Enable IOMMU in the kernel
Edit /etc/default/grub:
nano /etc/default/grub
Find the line starting with GRUB_CMDLINE_LINUX_DEFAULT=. It currently looks something like:
GRUB_CMDLINE_LINUX_DEFAULT="quiet"
Add IOMMU flags to the end:
- Intel:
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt" - AMD:
GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on iommu=pt"
The iommu=pt flag tells the kernel to pass through devices in their native IOMMU groups, reducing group size (useful for AMD boards especially).
Save the file and update GRUB:
update-grub
Step 2: Load VFIO kernel modules
Edit /etc/modules:
nano /etc/modules
Add these lines at the end:
vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd
Save and reboot:
reboot
Step 3: Verify IOMMU is working
After reboot, check:
dmesg | grep -i iommu | head -5
Look for a line like IOMMU: … detected. If you see nothing or “IOMMU: disabled”, go back to BIOS and verify that VT-d or IOMMU is enabled. If the BIOS shows it enabled, your motherboard may not support it (despite the manual claim) — consider bare metal.
Binding the GPU to VFIO
Once IOMMU is working, you need to stop the normal NVIDIA driver from claiming the GPU and let VFIO claim it instead.
Find the GPU’s PCI ID
Run:
lspci -nn | grep NVIDIA
Output looks like:
81:00.0 VGA compatible controller: NVIDIA Corporation GA102 [GeForce RTX 3090] [10de:2204]
Note the ID in the square brackets: 10de:2204 (vendor:device). The vendor 10de is always NVIDIA for NVIDIA GPUs; the device code (2204 here for RTX 3090) varies by model.
Bind the GPU to VFIO at boot
Create a file /etc/modprobe.d/vfio-pci.conf:
echo "options vfio-pci ids=10de:2204" > /etc/modprobe.d/vfio-pci.conf
Replace 10de:2204 with your GPU’s ID.
If your GPU has multiple device functions (GPU + audio, common on newer boards), list them all:
echo "options vfio-pci ids=10de:2204,10de:228b" > /etc/modprobe.d/vfio-pci.conf
Update the kernel module dependency database:
update-initramfs -u -k all
Reboot:
reboot
Verify VFIO bound the GPU
After reboot, run:
lspci -k | grep -A 2 NVIDIA
Look for a line like Kernel driver in use: vfio-pci. If it says nvidia or nouveau, the VFIO binding failed — the normal driver got there first. Common fixes:
- Blacklist the NVIDIA driver in
/etc/modprobe.d/blacklist-nvidia.conf(create it withecho "blacklist nvidia" > /etc/modprobe.d/blacklist-nvidia.conf), then reboot. - Ensure the VFIO entry in
/etc/modprobe.d/vfio-pci.confis correct (right PCI ID, right syntax). - Check the load order: VFIO must load before other GPU drivers. Add to the end of the vfio-pci.conf line: (if NVIDIA is still loading first, it may be hardcoded into the kernel or listed in
/etc/modules-load.d/.)
Creating a Proxmox VM with GPU passthrough
Once VFIO has the GPU, Proxmox can assign it to a VM.
Via the Proxmox web UI:
- Create a new VM (Datacenter → Create VM) with the OS of choice (Ubuntu 24.04 LTS is common for Ollama).
- Give it 4+ cores and 8+ GB RAM for a 7B model, 16+ GB for a 13B model.
- After creating the VM, click it in the sidebar and go to Hardware → Add → PCI Device.
- In the “Device” dropdown, select the NVIDIA GPU (it appears by name if VFIO is working).
- Check “All Functions” if it is a GPU with audio output (most modern cards).
- Check “Primary GPU” if you want the VM’s display to route through this GPU (optional; if unchecked, the GPU is compute-only and the VM uses emulated video).
- Click Add.
Via the terminal (advanced):
Edit /etc/pve/nodes/proxmox-node-name/qemu-server/VM-ID.conf and add:
hostpci0: 81:00,x-vga=on
Where 81:00 is the GPU’s PCIe address (from lspci). Change 81:00 to your GPU’s address. The x-vga=on flag tells KVM to map the GPU as the primary display device.
Boot the VM and install drivers
-
Start the VM.
-
Install the NVIDIA driver inside the VM:
sudo apt update sudo apt install -y nvidia-driver-550(Check NVIDIA’s website for the latest driver version; 550+ supports most modern cards.)
-
Verify the GPU is visible:
nvidia-smiShould show the GPU and its VRAM.
-
Install Ollama:
curl -fsSL https://ollama.ai/install.sh | sh -
Test:
ollama pull llama2:7b ollama run llama2:7b
The GPU should appear in nvidia-smi under the running process.
LXC containers and why they do not work for GPU passthrough
LXC containers are lighter than full VMs — no bootloader, no kernel, shared host kernel. But LXC does not support VFIO device passthrough. There is no mechanism in the LXC cgroup v2 setup to claim a PCI device group and pass it through the container boundary. You can bind /dev/nvidia* device nodes into an LXC container, but that requires the NVIDIA driver to be loaded on the host and shared via /dev — which breaks isolation and still requires the host to have GPU support.
Use a full KVM VM for any GPU workload on Proxmox, not LXC.
If you want LXC’s lightweight properties, the honest path is bare metal (no hypervisor at all) or a systemd-nspawn container on a bare metal Linux box.
Real failure modes and troubleshooting
| Failure Mode | Symptom | Root Cause | Solution |
|---|---|---|---|
| VFIO module not loaded | Error “module not found” after reboot | Stale initramfs or wrong load order | Run update-initramfs -u -k all and reboot |
| GPU shows as unassigned | No GPU in Proxmox PCI device dropdown | NVIDIA driver still bound to GPU instead of VFIO | Blacklist nvidia driver in /etc/modprobe.d/blacklist-nvidia.conf, update initramfs, reboot |
| IOMMU not detected | dmesg | grep iommu shows nothing or “disabled” | IOMMU disabled in BIOS or not supported by motherboard | Check BIOS for VT-d (Intel) or IOMMU (AMD) setting; if already enabled, motherboard may not support it |
| GPU grouped with other devices | Cannot isolate GPU in its own IOMMU group | Motherboard firmware design or ACS disabled | Check BIOS for ACS or per-port IOMMU settings; if unavailable, pass entire group or use bare metal |
| VM boots but no GPU detected | nvidia-smi shows “no GPU found” inside VM | NVIDIA driver not installed or wrong version inside VM | Run driver installation again; check lspci | grep NVIDIA inside VM to confirm passthrough worked |
| Kernel panic on VM boot | VM hangs or crashes immediately | VFIO reset broken on this GPU/driver combination | Add rombar=0 to hostpci config in /etc/pve/nodes/.../qemu-server/VM-ID.conf; consider bare metal |
”vfio-pci: module not found” after reboot
The VFIO modules exist in the kernel, but the module loading order was wrong or the initramfs is stale. Run:
update-initramfs -u -k all
And reboot. If it persists, check that /etc/modules has the four vfio lines and that you ran update-initramfs.
GPU shows as “unassigned” in Proxmox after VFIO bind
Proxmox web UI shows no GPU in the PCI Device dropdown. Likely cause: the normal NVIDIA driver still has the GPU. Run:
lspci -k | grep NVIDIA
If the output shows nvidia in the kernel driver line, blacklist it:
echo "blacklist nvidia-drm
blacklist nvidia
blacklist nouveau" > /etc/modprobe.d/blacklist-nvidia.conf
update-initramfs -u -k all
reboot
IOMMU group contains storage or network devices
Your motherboard puts the GPU in a group with other devices. Three options:
- Check BIOS for ACS or per-port IOMMU settings. Some Z790/X570 boards let you change grouping per PCIe slot. Consult your manual.
- If you control the other devices: Pass the entire group (e.g.,
hostpci0: 81:00for GPU,hostpci1: 81:01for audio). Works if you do not use those other devices on the host. - Switch to bare metal. Grassroots IOMMU grouping issues are often a sign that the motherboard was not designed for high-reliability passthrough. Bare metal avoids the problem.
VM boots but nvidia-smi shows “no GPU found”
The VM sees the GPU hardware (Proxmox successfully passed it through) but the NVIDIA driver is not loaded. Run inside the VM:
lspci | grep NVIDIA
If you see the GPU, the NVIDIA driver install failed or is the wrong version. Re-run the driver installation. If you see nothing, the passthrough failed — check Proxmox logs (dmesg | grep -i iommu on the host) for conflicts.
Kernel panic when VM starts
VFIO reset is broken on some GPUs or driver combinations (especially older consumer cards). This usually manifests as the VM hanging on boot or crashing immediately. Workaround: in the Proxmox VM config (/etc/pve/nodes/.../qemu-server/VM-ID.conf), add:
hostpci0: 81:00,x-vga=on,rombar=0
The rombar=0 flag disables GPU BIOS ROM access, which can help with reset stability. If it still fails, consider bare metal.
Honest bottom line
GPU passthrough on Proxmox works. The Proxmox documentation is good. The kernel machinery is solid. But the failure modes are scattered across BIOS settings, motherboard firmware design, and IOMMU grouping quirks that are invisible until you hit them.
When passthrough makes sense: You are already running Proxmox for other workloads, you have a motherboard with clean IOMMU isolation, and you want flexibility in allocating hardware.
When bare metal is simpler: This is a single-purpose inference box. Bare metal (Debian + NVIDIA driver + Ollama) installs in 30 minutes, needs no troubleshooting, and gives you a few percentage points of lower latency. Spend the IOMMU and VFIO debugging time on something else.
Whichever path you choose, validate it early — boot the VM or bare metal system, run nvidia-smi, fire up Ollama, and confirm your model loads and generates tokens at expected speed before you commit to it as your long-term inference box. The GPU works. The question is whether the hypervisor layer makes that work harder or easier in your specific case.
For context on GPU selection for this workload, see best GPU for local LLM. For how to structure your homelab build around Proxmox, the dual-RTX 3090 build guide walks through the full hardware picture. And once you have the GPU working, how to run LLMs locally covers the inference engine options and tuning.