GUIDE · 2026

GPU Spot Pricing Guide: How to Save 60–80% on Cloud GPUs in 2026

Spot instances offer the same H100 and A100 GPUs at a fraction of the price — if you know how to use them. This guide covers the mechanics, real current prices, provider trade-offs, and the exact practices that separate teams saving 70% from teams that get burned.

📅 Updated: 2026-06-03 ⏱ 8 min read Live pricing data
Contents
  1. What are GPU spot instances?
  2. Current price landscape
  3. Provider comparison
  4. Best practices
  5. Monitor spot prices in real-time

What Are GPU Spot Instances?

Cloud providers — AWS, GCP, Azure, and the GPU-focused clouds — maintain large pools of GPU capacity. A significant portion of that capacity sits idle at any given moment. Rather than let it go to waste, they sell it at steep discounts as spot instances (also called preemptible or interruptible instances). You get the same hardware — an H100, an A100, an L40S — for 60–80% less than the on-demand price.

The catch: the provider can reclaim your spot instance with short notice (typically 2–5 minutes on hyperscalers, sometimes immediate on GPU-native clouds) when demand spikes. In practice, interruption rates are low during off-peak hours, but you can't rely on zero interruptions for production workloads without a fallback plan.

Why do spot prices vary so much? Because they're driven by real-time supply and demand. An H100 spot on AWS might be $2.10/hr right now and $3.80/hr two hours later if a wave of demand hits. GPU-native clouds (RunPod, Vast.ai, CoreWeave) set prices more statically, which means lower volatility but also less transparency into when supply will dry up. Understanding this dynamic is step one to using spot effectively.

60–80%
typical savings vs. on-demand
2–5 min
interruption notice (hyperscalers)
7
major providers with spot GPU capacity
Spot vs On-Demand: When to Use Each

Use spot for: model training runs, batch inference, fine-tuning jobs, research experiments, CI/CD pipelines — anything that can checkpoint and resume. Use on-demand for: production API inference serving, real-time user-facing workloads, jobs without checkpointing that would be expensive to restart.

Current GPU Spot Price Landscape

The table below shows live spot prices pulled from our monitoring system. ★ marks the cheapest provider per GPU. The "Max Savings" column shows how much cheaper the best spot option is versus the most expensive provider.

GPU Model AWS GCP Azure Lambda CoreWeave RunPod Vast.ai Max Savings
NVIDIA H200141GB★ $0.50/hr$1.97/hrSave 75%
NVIDIA H100 PCIe80GB★ $1.99/hr$2.13/hrSave 7%
NVIDIA H100 SXM80GB$57.76/hr$26.85/hr$4.16/hr$2.49/hr$2.07/hr$2.59/hr★ $1.47/hrSave 97%
NVIDIA A100 40GB40GB$8.10/hr$8.79/hr★ $0.68/hrSave 92%
NVIDIA A100 PCIe80GB$1.19/hr★ $0.76/hrSave 36%
NVIDIA A100 SXM80GB$7.64/hr$1.52/hr$0.90/hr$1.29/hr$1.21/hr$1.00/hr★ $0.43/hrSave 94%
NVIDIA L40S48GB$1.82/hr$1.82/hr$0.79/hr★ $0.43/hrSave 77%
NVIDIA L4048GB★ $0.69/hrSave 0%
LIVE
Last updated: Jun 3, 2026, 06:33 AM UTC · Full comparison table →
Reading This Table

Prices shown are spot rates where available, or on-demand where spot is not offered. ★ = cheapest available. "Max Savings" = how much cheaper the best spot option is versus the priciest provider for that GPU. Prices update every 30 minutes.

What's Driving Prices Right Now

H100 availability remains tight on the major hyperscalers, which keeps spot prices higher than expected. GPU-native clouds (RunPod, Vast.ai) offer the steepest discounts but with less capacity guarantees. A100s have more availability and often represent the best cost/performance trade-off for training workloads that don't strictly require H100s.

L40S has emerged as a strong mid-tier option: cheaper than H100 or A100 on most providers, with 48GB VRAM that handles most fine-tuning and inference jobs. If your workload fits in 48GB, it's worth benchmarking L40S pricing before defaulting to A100.

Provider-by-Provider Breakdown

Seven providers offer GPU spot capacity in 2026. They fall into two categories: hyperscalers (massive capacity, volatile pricing, enterprise tooling) and GPU-native clouds (purpose-built for ML workloads, more predictable pricing, less capacity).

AWS (Amazon)
p3, p4d, p4de, p5 instances
  • Deepest capacity pool globally
  • 2-minute spot interruption notice
  • Best ecosystem integrations (S3, EFS, EKS)
  • Highest list price among all 7
  • Spot savings are smaller (40–55%)
GCP (Google Cloud)
a2, a3, g2 instances
  • Spot VMs are very reliable
  • TPU access as alternative
  • Strong networking for distributed training
  • Limited H100 spot availability
  • Complex networking config
Azure (Microsoft)
NCv3, NDv2, NC H100 v5
  • Good for teams already on Azure
  • Spot eviction policy controls
  • Pricing not consistently competitive
  • GPU availability varies by region
Lambda Cloud
GPU clusters, on-demand + spot
  • Purpose-built for ML workloads
  • Simple pricing, no egress fees
  • Good H100 and A100 availability
  • Smaller scale than hyperscalers
  • Limited regions
CoreWeave
H100, A100, L40S clusters
  • Best H100 spot pricing
  • Kubernetes-native
  • High-bandwidth GPU interconnects
  • Contract-focused (enterprise bias)
  • Setup complexity for small teams
RunPod
On-demand + spot pods
  • Lowest prices on consumer GPUs
  • Fast spin-up, simple API
  • Community cloud for cheap experiments
  • Variable reliability on spot pods
  • Less consistent SLA
Vast.ai
Marketplace model
  • Lowest absolute prices (marketplace)
  • Wide GPU variety
  • Quality varies by host
  • Not suitable for production workloads

Best Practices for GPU Spot Instances

Spot instances are not plug-and-play replacements for on-demand. The teams that capture 70%+ savings reliably have the same core habits. Here's the playbook.

The Spot Instance Playbook
Checkpoint aggressively Save model state to object storage (S3, GCS) every N steps. If your job runs 8 hours without a checkpoint and gets interrupted at hour 7, you've lost everything. 15-30 minute checkpoint intervals are the standard.
Always have a fallback Set up automatic failover to on-demand if spot availability drops below a threshold. Running a mixed fleet (80% spot, 20% on-demand) gives you resilience without sacrificing most of the savings.
Monitor spot prices before launching Spot prices can swing 2–3× within hours on hyperscalers. Check current pricing across all 7 providers before starting a large job — the cheapest option shifts regularly. Use our live comparison to check before you launch.
Set interruption handlers On AWS, spot interruption notices land 2 minutes early as an EC2 instance metadata event. Write a handler that triggers your checkpointing logic and gracefully stops the job when this fires.
Use spot for the right workloads Training runs, fine-tuning, batch evaluation, and research experiments are ideal. Real-time inference serving with SLA requirements is not — use reserved or on-demand instances there.
Price alert before launch Set a budget threshold alert so you're notified if spot prices creep up during a long training run. What started at $1.20/hr can climb to $2.80/hr if demand spikes.
When On-Demand is the Right Call

Real-time inference APIs, customer-facing model serving, and jobs without checkpointing support should run on on-demand or reserved capacity. The interruption risk makes spot the wrong tool for workloads where restarts cost more than the savings. The 60-80% discount only matters if the job completes successfully.

GPU Selection Strategy

Don't default to H100 because it's the fastest chip. For most training and fine-tuning workloads, an A100 or L40S at 50–60% of the H100 spot price will complete the job at comparable total cost. Profile your workload's actual GPU utilization and memory footprint first. If you're under 40GB VRAM, an L40S or A100 40GB will work and typically has better spot availability.

For transformer training at scale, H100's HBM3 and NVLink make a real difference — that's the workload where paying the H100 premium (even spot) is justified. For inference serving or fine-tuning smaller models, the A100 is usually the sweet spot on price/performance.

Monitor GPU Spot Prices in Real-Time

Spot prices shift by the hour. RoofRun tracks H100, A100, L40S, and more across all 7 providers, updated every 30 minutes. Set alerts to notify you when prices drop below your threshold — so you launch jobs at the right time.