Spot instances offer the same H100 and A100 GPUs at a fraction of the price — if you know how to use them. This guide covers the mechanics, real current prices, provider trade-offs, and the exact practices that separate teams saving 70% from teams that get burned.
Cloud providers — AWS, GCP, Azure, and the GPU-focused clouds — maintain large pools of GPU capacity. A significant portion of that capacity sits idle at any given moment. Rather than let it go to waste, they sell it at steep discounts as spot instances (also called preemptible or interruptible instances). You get the same hardware — an H100, an A100, an L40S — for 60–80% less than the on-demand price.
The catch: the provider can reclaim your spot instance with short notice (typically 2–5 minutes on hyperscalers, sometimes immediate on GPU-native clouds) when demand spikes. In practice, interruption rates are low during off-peak hours, but you can't rely on zero interruptions for production workloads without a fallback plan.
Why do spot prices vary so much? Because they're driven by real-time supply and demand. An H100 spot on AWS might be $2.10/hr right now and $3.80/hr two hours later if a wave of demand hits. GPU-native clouds (RunPod, Vast.ai, CoreWeave) set prices more statically, which means lower volatility but also less transparency into when supply will dry up. Understanding this dynamic is step one to using spot effectively.
Use spot for: model training runs, batch inference, fine-tuning jobs, research experiments, CI/CD pipelines — anything that can checkpoint and resume. Use on-demand for: production API inference serving, real-time user-facing workloads, jobs without checkpointing that would be expensive to restart.
The table below shows live spot prices pulled from our monitoring system. ★ marks the cheapest provider per GPU. The "Max Savings" column shows how much cheaper the best spot option is versus the most expensive provider.
| GPU Model | AWS | GCP | Azure | Lambda | CoreWeave | RunPod | Vast.ai | Max Savings |
|---|---|---|---|---|---|---|---|---|
| NVIDIA H200141GB | — | — | — | — | — | ★ $0.50/hr | $1.97/hr | Save 75% |
| NVIDIA H100 PCIe80GB | — | — | — | — | — | ★ $1.99/hr | $2.13/hr | Save 7% |
| NVIDIA H100 SXM80GB | $57.76/hr | $26.85/hr | $4.16/hr | $2.49/hr | $2.07/hr | $2.59/hr | ★ $1.47/hr | Save 97% |
| NVIDIA A100 40GB40GB | $8.10/hr | $8.79/hr | — | — | — | — | ★ $0.68/hr | Save 92% |
| NVIDIA A100 PCIe80GB | — | — | — | — | — | $1.19/hr | ★ $0.76/hr | Save 36% |
| NVIDIA A100 SXM80GB | $7.64/hr | $1.52/hr | $0.90/hr | $1.29/hr | $1.21/hr | $1.00/hr | ★ $0.43/hr | Save 94% |
| NVIDIA L40S48GB | $1.82/hr | — | — | — | $1.82/hr | $0.79/hr | ★ $0.43/hr | Save 77% |
| NVIDIA L4048GB | — | — | — | — | — | ★ $0.69/hr | — | Save 0% |
Prices shown are spot rates where available, or on-demand where spot is not offered. ★ = cheapest available. "Max Savings" = how much cheaper the best spot option is versus the priciest provider for that GPU. Prices update every 30 minutes.
H100 availability remains tight on the major hyperscalers, which keeps spot prices higher than expected. GPU-native clouds (RunPod, Vast.ai) offer the steepest discounts but with less capacity guarantees. A100s have more availability and often represent the best cost/performance trade-off for training workloads that don't strictly require H100s.
L40S has emerged as a strong mid-tier option: cheaper than H100 or A100 on most providers, with 48GB VRAM that handles most fine-tuning and inference jobs. If your workload fits in 48GB, it's worth benchmarking L40S pricing before defaulting to A100.
Seven providers offer GPU spot capacity in 2026. They fall into two categories: hyperscalers (massive capacity, volatile pricing, enterprise tooling) and GPU-native clouds (purpose-built for ML workloads, more predictable pricing, less capacity).
Spot instances are not plug-and-play replacements for on-demand. The teams that capture 70%+ savings reliably have the same core habits. Here's the playbook.
Real-time inference APIs, customer-facing model serving, and jobs without checkpointing support should run on on-demand or reserved capacity. The interruption risk makes spot the wrong tool for workloads where restarts cost more than the savings. The 60-80% discount only matters if the job completes successfully.
Don't default to H100 because it's the fastest chip. For most training and fine-tuning workloads, an A100 or L40S at 50–60% of the H100 spot price will complete the job at comparable total cost. Profile your workload's actual GPU utilization and memory footprint first. If you're under 40GB VRAM, an L40S or A100 40GB will work and typically has better spot availability.
For transformer training at scale, H100's HBM3 and NVLink make a real difference — that's the workload where paying the H100 premium (even spot) is justified. For inference serving or fine-tuning smaller models, the A100 is usually the sweet spot on price/performance.
Spot prices shift by the hour. RoofRun tracks H100, A100, L40S, and more across all 7 providers, updated every 30 minutes. Set alerts to notify you when prices drop below your threshold — so you launch jobs at the right time.