Can I run a 13B LLM on a Tesla P40?

Yes, at Q4 quantization with headroom, though thermals demand a blower shroud or a chassis with forced airflow. Without cooling mod, the card will throttle to 70–80% clocks under load.

Does the P40 have video outputs?

No. It is a datacenter GPU with no HDMI or DisplayPort. You need a separate display GPU or to run headless via SSH, VNC, or API.

How does P40 performance compare to RTX 3090?

The P40 is older (Maxwell-era, 2016) with lower memory bandwidth than GDDR6X; expect 40–60% slower decode speed. The 24GB VRAM is the only thing they share.

Why is the P40 so cheap?

Datacenter EOL (end-of-life) surplus. Millions left mining and cloud jobs. No resale demand from businesses means the secondhand market floods with cheap inventory.

Do I need a special power cable or PSU?

The P40 uses dual 6-pin PCI-E connectors, standard in any gaming PSU. No exotic cables needed. Total power draw is 250W TDP.

Tesla P40 in 2026: 24GB of VRAM for $150 — Legit Bargain or Trap?

The Tesla P40 is a ghost from the datacenter wars. It cost Nvidia thousands to manufacture, powered millions of hours in AWS and Google Cloud, and now sits in surplus bins at $150–200 used. For that price you get 24GB of VRAM — the same capacity as an RTX 3090 that costs $500–800 — and the honest question everyone asks is: what’s the catch?

The catch is not a lie. It is legible and real. The P40 throttles hard without cooling mods, its compute is old, and it has no video output. These are not hidden gotchas; the community selling it to you volunteers them freely. The question is whether you enjoy solving those problems, and whether your workload actually needs what the P40 can deliver. For the right person, it is a legitimate bargain. For everyone else, it is a money pit of frustration.

The core principle: datacenter VRAM at consumer price, but at a cost

The P40 exists because of the VRAM principle: token generation is bandwidth-bound and compute-light. A language model’s decode step is dominated by reading weights from memory, not by arithmetic. This is why a 2016 Maxwell GPU with 24GB of memory can still run a 13B LLM at respectable speed — and why a newer card with half the VRAM will not.

The P40 was built for image inference and machine learning in datacenters, where thermals and noise are someone else’s problem. It has:

24GB GDDR5 memory — the same capacity tier as an RTX 3090, but with lower bandwidth (346 GB/s vs 936 GB/s).
No cooling beyond passive heatsinks — designed for datacenter air handlers, not living rooms.
No display outputs — it never needed to talk to a monitor.
Maxwell-era compute — older architecture, no tensor cores, slower for inference than modern cards.

For a tinkerer who buys a $50 3D-printed shroud and adds a blower fan, this is fine. For someone buying a GPU to run inference on day one without modifying it, the P40 is the wrong choice.

Master comparison: P40 vs other 24GB options

GPU	VRAM	Memory BW	~7B Q4_K_M tok/s	Passive cooling?	Video out?	Used price (2026-06-29)
Tesla P40	24 GB GDDR5	346 GB/s	~30–50 (throttled: 20–30)	No — needs mod	No	~$150–$200
RTX 3090	24 GB GDDR6X	936 GB/s	~80–110	Yes (loud)	Yes	~$500–$800
RTX 4090	24 GB GDDR6X	1,152 GB/s	~120–160	Yes (loud)	Yes	$1,600–$2,000+ (new)
AMD MI50	32 GB HBM2	1,024 GB/s	~60–90	No — needs mod	No	~$120–$180

Note on tok/s figures: Community-cited (r/LocalLLaMA, llama.cpp benchmark threads, 2024–2025), not independently verified by LocalRig. The P40 throttled speed reflects reports of passive operation without airflow. With a blower shroud and active cooling, the P40 unlocks toward the unthrottled range.

The P40 is cheaper than everything with 24GB. The MI50 is cheaper and has more VRAM. Neither has cooling out of the box. The trade-off is explicit: you pay for convenience (RTX 3090 or higher) or you engineer convenience yourself (P40 + shroud, MI50 + shroud).

The cooling problem: not optional, not subtle

This is the section that prevents regret. A Tesla P40 in a quiet case with no case fans, or in a chassis with only motherboard cooling, will throttle. Thermal throttling is not a dramatic crash — it is a silent 30–40% performance reduction that makes you wonder why this “bargain” card feels slow.

The why is simple: Maxwell-era chips run hot, passive heatsinks dissipate only so much power, and the P40 TDP is 250W. Add it to a PC with poor airflow and the die temperature rises to 85°C, 90°C, or higher. At that point the card backs off clocks to cool down. You do not see an error; you just see decode speed half of what you expected.

The fix is not expensive:

3D-printed shroud (~$20 in material) that channels air from a case fan or external blower across the heatsink.
120mm blower fan (Amazon, ~$20–40): attach it to the shroud or the chassis to push air through the P40’s fins.
Thermal paste replacement (~$10): the P40 is 8–9 years old; the factory paste is likely dry. Repasting is a 20-minute job and recovers another 5–10°C of thermal headroom.

Community experience (like2byte, 2026) shows that with a blower shroud and intake airflow, the P40 idles at 40–50°C and holds ~70–75°C under sustained load, with clocks stable and no throttling. Without it, the same card idles at 60°C and hits throttle limits by 2–3 minutes into inference.

This is the mandatory decision point: are you willing to spend $50 and two hours on a cooling mod before plugging in the card? If the answer is no, buy a used RTX 3090 or accept slower speed. If the answer is yes, the math shifts in the P40’s favor.

Performance reality: speed tier and memory bandwidth

The P40 occupies a strange tier. It has the VRAM of a mid-range consumer card but the bandwidth of nothing in between.

Community benchmarks on 7B models at Q4 quantization report:

P40 with passive cooling (no airflow): 20–30 tok/s due to throttling.
P40 with blower shroud and intake airflow: 30–50 tok/s, typically 40–45 tok/s in stable conditions.
RTX 3090 for comparison: 80–110 tok/s.

The P40’s 346 GB/s bandwidth limits it to roughly 40–50% the speed of an RTX 3090’s 936 GB/s. That is not a flaw; it is the arithmetic of older memory architecture. For a single user running inference, 40 tok/s is perfectly usable. You read faster than that. For batching many concurrent requests (a different workload), it becomes a problem.

The P40 also lacks tensor cores. It can run CUDA inference via llama.cpp or Ollama, but newer cards with tensor acceleration are more efficient. The lack of hardware acceleration is not a dealbreaker — plenty of people run older cards — but it is one more handicap in a stack of handicaps.

The P40 shines on one axis: VRAM per dollar. At $150–200 for 24GB, it is the cheapest 24GB card on the secondhand market. You cannot load a 7B full-precision model or a 13B Q4 model on a $300 RTX 3060 12GB. You can on a P40. That is a real constraint win for budget-limited buyers who do not mind the engineering.

Who should and should not buy the Tesla P40

Buy the P40 if all of these are true:

You enjoy tinkering. A cooling shroud, thermal paste, and airflow tuning are not obstacles; they are part of the hobby.
You have hard VRAM constraints and a tight budget. A 13B model that will not fit on a 12GB card, and no room in the budget for a $500+ used 3090.
Your workload is single-user or low-concurrency. 40–50 tok/s is acceptable for interactive chat or coding assistance.
You have access to a case or chassis with good airflow. A gaming tower with intake and exhaust fans is ideal; a fully blocked enclosure makes the problem worse.
You do not need video outputs on the same machine. You can run the P40 headless via SSH, VNC, or API.

Do not buy the P40 if:

You want inference to work on day one without modification. Buy an RTX 3090 or higher and skip the cooling engineering.
You are buying for a production or multi-user service. The speed ceiling and lack of tensor cores make it a poor choice for batching.
You need video output on the same machine. The P40 has none. You need a separate display GPU.
Your case is passively cooled or has poor airflow. The P40 will throttle and frustrate you.
You have not read the cooling section above. If you skipped it, you are going to buy this card, plug it in, and be confused why it is slow.

Used market reality: where to find them and what to check

Tesla P40 cards flood the used market because datacenters refresh regularly and hundreds of thousands of these cards powered the cloud before GPU shortage made them valuable elsewhere. Inventory is stable and cheap.

Where to find them:

eBay P40 listings, typically $150–200 with free shipping. Filter for “tested and working” and check seller feedback.
Some specialized server equipment resellers carry them with light warranty coverage.
Avoid listings with no photos of the actual card or vague descriptions.

What to check before buying:

Demand photos showing the heatsink, the PCB, and the two 6-pin power connectors. No missing capacitors or visible burn marks.
Confirm GDDR5 memory (not GDDRx or GDDR6) — specs should say P40, not P100 or P4.
Ask if the card was used for mining or compute. Compute-used cards are better (lower peak temps) than mining cards. Mining-used cards often run fine but carry higher failure risk.
Budget $10–15 for fresh thermal paste as routine maintenance.
Confirm your PSU has two free 6-pin PCI-E connectors and 250W capacity reserved for this card.

The P40 you are buying is likely 8–9 years old. It has not been in a gaming rig; it has been in a cloud datacenter. This usually means it is well-binned (lower failure rates on bulk-tested parts) but thermally stressed. A repaste and a cooling mod are not optional upgrades — they are insurance.

Why the P40 still makes sense despite the caveats

The VRAM-over-compute principle does not care about the card’s age. A 24GB card in 2026 still runs the same 13B and 7B models as a 3090 — just slower. If you need capacity and your budget is hard at under $300, the P40 is the only answer that delivers 24GB.

More broadly, the P40 represents a deeper principle: there is no shame in buying old hardware that solves a real constraint, as long as you know what you are buying. The P40 is not a trick or a trap. It is a datacenter card that the cloud industry threw away, and the community repurposed it for inference. That repurposing has real requirements — cooling, airflow, patience — but the VRAM tier is genuine.

The AMD MI50, which sits in the same niche, carries similar caveats with 32GB instead of 24GB. Both cards serve people for whom a $150 24GB card with engineering beats a $500 3090 with convenience.

For sizing your model to VRAM, see What Is Quantization and Hardware to Run a 7B Model Locally. For the buying logic itself, the Local-AI Hardware Buying Framework ranks cards by constraint, and Why VRAM Matters More Than Compute explains the bandwidth principle in depth.

If you are exploring 24GB cards without the P40’s friction, Best GPU for Local LLM Inference ranks the RTX 3090 and others. For the MI50, a similar 24–32GB datacenter card, see AMD MI50 32GB for Local LLM. If your budget allows $300–500, Best GPU Under $500 for Local LLM covers mid-range options.

For cooling and chassis advice specific to high-VRAM builds, Quiet Cooling for GPU Server covers blower shrouds, thermal paste, and airflow strategy in detail.

Bottom line

The Tesla P40 is genuinely cheap. It genuinely holds 24GB. It genuinely runs local LLMs. The catch is equally genuine: it throttles without forced airflow, it is slow compared to modern 24GB cards, and it has no video output. These are not hidden gotchas. They are loud and clear.

For the right buyer — someone who enjoys the engineering, has a tight VRAM-capacity constraint, and is comfortable running inference headless — the P40 is a real bargain. For everyone else, the $300 RTX 3060 12GB or the $500–800 RTX 3090 are the better bets. The tinkering cost of the P40 is not just dollars. It is time and patience. Know which you have before you bid.

Ready to buy? Check used P40 listings with a blower shroud and thermal paste in your cart: Browse Tesla P40 24GB on eBay →

Accessory basket (non-negotiable for this card):