Local vs Cloud

The Hidden Costs of Cloud GPUs: Bandwidth Fees, Preemption Multipliers, and Silent Throttling

The single most common complaint in cloud GPU forums is not “it’s too expensive.” It’s “it cost more than I was quoted.” Those are different problems. The first is a pricing decision you can make with full information. The second is a trust violation — the mental model you built (“$0.40/hr, times hours used, equals my bill”) turned out to be missing line items nobody put on the pricing page in bold type.

That gap between advertised rate and realized invoice is the subject of this guide. It shows up in three places, consistently, across providers: bandwidth/egress fees, preemption economics on interruptible instances, and silent performance throttling that keeps the price the same while cutting what you get for it. None of these are secret exactly — they’re usually disclosed somewhere in a docs page or terms-of-service — but they are not priced into the number most people compare when shopping. This is the constraint framework LocalRig uses across the local-vs-cloud cluster: before you pick a provider, price your workload’s bandwidth needs, its interruption tolerance, and its idle time, because those three variables are where the sticker price and the invoice part ways.

Why do hidden costs feel worse than high prices?

Because a high advertised price is information you can act on before you commit, while a hidden cost is information you only get after you’ve already spent the compute-hours. A100 at $2.50/hr and A100 at $1.20/hr are both prices you can compare on a spreadsheet. A bandwidth overage you discover on the invoice, or a silent 20% downclock you only notice because your fine-tune is taking longer than the benchmark said it should, isn’t comparison-shoppable at all — you find out after the fact, and by then you’ve already paid for the compute time the throttling wasted.

This is also why hidden costs generate the angriest forum threads rather than the most-liked comparison posts. A high price is a business decision. An undisclosed line item is a felt breach of the deal you thought you’d made. That distinction is worth keeping in mind as a buyer: it means the providers with the cleanest, most legible pricing pages are doing real work to earn trust, not just competing on rate.

What are the three hidden-cost categories that actually move your bill?

They are bandwidth/egress, preemption multipliers, and silent throttling — and each one interacts with a different kind of workload, so the exposure is workload-specific, not universal.

1. Bandwidth and egress fees. If your workflow moves large datasets or checkpoints in and out of the instance repeatedly, per-GB transfer charges can rival or exceed the compute cost itself. This is easy to miss because the advertised $/hr rate says nothing about it.

2. Preemption multipliers on spot/interruptible instances. The lowest advertised price on almost every cloud GPU platform is the spot/preemptible tier. That price is real, but it assumes your workload tolerates being killed mid-run without notice. If it doesn’t — long fine-tunes, stateful inference servers, anything without frequent checkpointing — the “cheap” tier either costs you wasted compute on restarts or forces you to pay a premium (non-preemptible guarantee, or a higher-availability region) that stacks on top of the base rate.

3. Silent throttling. The rarest to document but the most corrosive to trust: the advertised spec and price stay the same, but delivered performance quietly drops. Community reports describe this on marketplace-style platforms where hardware is host-supplied rather than provider-owned, which makes consistent enforcement harder.

Case 1: What happened with Vast.ai bandwidth overages and downclocking?

Vast.ai’s marketplace model — where individual hosts, not Vast.ai itself, supply the physical GPUs — is efficient on price but means bandwidth terms and delivered clock speeds can vary host to host, and users have reported both bandwidth overage charges and undisclosed downclocking on specific listings.

A user-reported case (r/MachineLearning and the Vast.ai community forum, 2025, not independently verified by LocalRig) cites bandwidth overage billing around $2.5 per 100GB on an instance where the advertised hourly rate did not make that transfer cost obvious upfront. Separately, community threads (2025, not independently verified by LocalRig) describe silent downclocking of roughly 22% on some instances — the listed price and specs stayed the same, but delivered throughput measured meaningfully lower, consistent with a host running the card below its rated clocks (thermal management, undervolting, or shared-resource contention are the likely mechanisms, though Vast.ai has not confirmed a specific cause in these reports). Stacking transfer overages with underperforming compute, some user-reported totals put realized cost up to ~29% over the list price for the session (community forum threads, 2025, not independently verified by LocalRig).

None of these figures are LocalRig first-party measurements — they are community-cited numbers from users describing their own bills and benchmarks, and Vast.ai’s marketplace structure means your specific host’s behavior may not match any of them. The takeaway isn’t “avoid Vast.ai” — its price floor is real and well-documented elsewhere — it’s “verify bandwidth terms and benchmark delivered throughput against advertised specs before committing to a long run,” which is exactly the protective step covered in the Vast.ai review.

Case 2: How do Modal’s pricing multipliers stack?

Modal’s published pricing (modal.com/pricing, vendor-documented, 2025-2026) applies a regional multiplier of roughly 1.25x to 2.5x over its base rate depending on where the workload runs, and a separate surcharge for guaranteeing a non-preemptible instance instead of accepting preemption risk. Those two are independent line items on the pricing page — but they stack. A workload that needs both a premium region and a non-preemptible guarantee can land at roughly 3.75x the headline base rate (1.5x region-tier example × 2.5x non-preemption example, illustrative of the stacking mechanic Modal documents; check current published multipliers for exact figures, since they are subject to change).

This is the cleanest documented example of the “advertised rate is the floor, not the price” pattern: Modal is transparent that these multipliers exist, but the headline number a buyer first sees is the base rate before any of them apply. If your workload is latency-sensitive inference that cannot tolerate interruption, and you need it in a premium region, price the stacked rate — not the base rate — before you commit. The full breakdown of Modal’s serverless model, including where these multipliers show up in practice, is in the Modal serverless GPU review.

Case 3: Can RunPod terminate my instance if my balance runs low?

Yes — this one is not a community rumor, it’s in RunPod’s own help center. RunPod’s documented policy (docs.runpod.io, vendor-documented, 2025-2026) is that pods can be stopped or terminated when account balance drops too low to cover ongoing charges. This isn’t a “hidden” cost in the sense of an undisclosed fee — it’s a documented risk that’s easy to overlook because it only bites when your balance happens to run low during an active job, and the consequence (data loss on ephemeral, non-persistent storage) can be severe if you haven’t set up a persistent volume or an off-instance backup.

The fix is entirely within your control: set balance alerts, use persistent storage for anything you can’t afford to lose, and don’t run unattended long jobs against a low balance. The full walkthrough of RunPod’s storage model and how to avoid this specific failure mode is in how to avoid RunPod data loss.

Comparison: where does each hidden cost show up?

Cost typeProvider case documented hereType of evidenceWhat it does to your bill
Bandwidth/egress overageVast.ai, ~$2.5/100GB citedUser-reported, 2025, not verified by LocalRigAdds a per-GB charge invisible in the advertised $/hr rate
Silent downclockingVast.ai, ~22% citedUser-reported, 2025, not verified by LocalRigSame price, less delivered throughput — you pay full rate for fewer tokens/steps per hour
Realized cost over listVast.ai, up to ~29% citedUser-reported, 2025, not verified by LocalRigCombined effect of overage + underperformance vs. quoted rate
Regional multiplierModal, 1.25x-2.5xVendor-documented pricing page, 2025-2026Multiplies the base rate depending on region selection
Non-preemption surchargeModal, stacks to ~3.75x combinedVendor-documented pricing page, 2025-2026Multiplies again if you need guaranteed (non-spot) capacity
Low-balance terminationRunPodVendor-documented help center, 2025-2026Not a fee — a termination/data-loss risk if balance runs out mid-job

How do I price bandwidth, interruption tolerance, and idle time before I rent?

Run this three-question check against your actual workload before comparing $/hr numbers across providers — it takes ten minutes and it’s the difference between a bill you expected and one you didn’t.

  1. Bandwidth: how much data crosses the wire, and what does this specific host/plan charge for it? Estimate total GB in and out for the full job (dataset load, checkpoint saves, model weight downloads), then check the provider’s current transfer terms — not a generic FAQ, the actual instance or region you’re renting, since marketplace platforms vary by host.
  2. Interruption tolerance: can this job survive being killed and restarted without notice? If it can (frequent checkpointing, stateless inference, embarrassingly parallel batch work), the cheapest spot/preemptible tier is genuinely cheap. If it can’t (long fine-tunes without checkpoints, stateful serving), price the non-preemptible tier honestly — including any regional multiplier — before comparing it to a competitor’s spot price.
  3. Idle time: how much are you paying for the instance sitting there between jobs? Storage-while-stopped fees, minimum billing increments, and forgotten-running-instance risk are a quieter version of the same problem — the advertised $/hr assumes 100% utilization, and almost no real workload hits that.

If you’re still deciding between renting at all versus buying hardware, this same bandwidth/interruption/idle-time math is also the honest starting point for the rent-vs-buy GPU break-even — hidden costs shift that break-even point earlier than the advertised rate alone suggests, sometimes by months.

How do I protect myself, provider by provider?

  • Vast.ai — Confirm bandwidth terms on the specific listing before a large transfer job, and benchmark delivered tok/s or step-time against the advertised spec in the first few minutes of a rental rather than assuming it holds for the full run. See the full Vast.ai review and LocalRig’s Vast.ai safety review for the marketplace-model tradeoffs.
  • Modal — Before quoting a project cost, price the stacked rate (region multiplier × preemption guarantee, if you need both) rather than the base rate on the homepage. Check Modal’s current pricing directly and read the Modal serverless GPU review for where this bites in practice.
  • RunPod — Set a balance alert well above zero, and never run an unattended long job on ephemeral storage without a persistent volume or off-instance backup. Check RunPod’s pod pricing and read how to avoid RunPod data loss for the full setup.

(RunPod and Vast.ai affiliate programs are pending approval — the links above are plain provider URLs, not referral links.)

Bottom line

Hidden costs aren’t usually fraud — they’re disclosed terms that don’t make it into the number you compare across providers. Bandwidth overage, preemption economics, and throttling are the three places where an advertised $/hr rate and a realized invoice diverge, and each one is workload-specific: a batch job with no transfer and full preemption tolerance may never see any of this, while a long fine-tune moving large datasets on a spot instance can hit all three at once. Price your own bandwidth, interruption tolerance, and idle time against a provider’s actual current terms before you compare rates — not after the invoice arrives. Every figure in this guide describing a specific dollar or percentage outcome is either user-reported (and explicitly not independently verified by LocalRig) or drawn from a vendor’s own published documentation; treat them as evidence to check against your own provider’s current terms, not as guarantees of what you’ll pay.

Sources

  • user-reported Vast.ai bandwidth overage case, ~$2.5/100GB cited, r/MachineLearning and Vast.ai community forum, 2025 (not independently verified by LocalRig)
  • user-reported Vast.ai silent downclocking (~22%) at unchanged listed price, community forum threads, 2025 (not independently verified by LocalRig)
  • user-reported realized cost up to ~29% over list price on Vast.ai marketplace instances, community forum threads, 2025 (not independently verified by LocalRig)
  • Modal pricing documentation — regional pricing multiplier (1.25x-2.5x) and non-preemptible instance surcharge, modal.com/pricing, 2025-2026 (vendor-documented)
  • RunPod Help Center — low-balance pod termination policy, docs.runpod.io, 2025-2026 (vendor-documented)