GUIDE · 2026

GPU Spot Pricing Guide: How to Save 60–80% on Cloud GPUs in 2026

Spot instances offer the same H100 and A100 GPUs at a fraction of the price — if you know how to use them. This guide covers the mechanics, real current prices, provider trade-offs, and the exact practices that separate teams saving 70% from teams that get burned.

📅 Updated: 2026-06-03 ⏱ 8 min read Live pricing data

Contents

What are GPU spot instances?
Current price landscape
Provider comparison
Best practices
Monitor spot prices in real-time

Section 01

What Are GPU Spot Instances?

Cloud providers — AWS, GCP, Azure, and the GPU-focused clouds — maintain large pools of GPU capacity. A significant portion of that capacity sits idle at any given moment. Rather than let it go to waste, they sell it at steep discounts as spot instances (also called preemptible or interruptible instances). You get the same hardware — an H100, an A100, an L40S — for 60–80% less than the on-demand price.

The catch: the provider can reclaim your spot instance with short notice (typically 2–5 minutes on hyperscalers, sometimes immediate on GPU-native clouds) when demand spikes. In practice, interruption rates are low during off-peak hours, but you can't rely on zero interruptions for production workloads without a fallback plan.

Why do spot prices vary so much? Because they're driven by real-time supply and demand. An H100 spot on AWS might be $2.10/hr right now and $3.80/hr two hours later if a wave of demand hits. GPU-native clouds (RunPod, Vast.ai, CoreWeave) set prices more statically, which means lower volatility but also less transparency into when supply will dry up. Understanding this dynamic is step one to using spot effectively.

60–80%

typical savings vs. on-demand

2–5 min

interruption notice (hyperscalers)

major providers with spot GPU capacity

Spot vs On-Demand: When to Use Each

Use spot for: model training runs, batch inference, fine-tuning jobs, research experiments, CI/CD pipelines — anything that can checkpoint and resume. Use on-demand for: production API inference serving, real-time user-facing workloads, jobs without checkpointing that would be expensive to restart.

Section 02

Current GPU Spot Price Landscape

The table below shows live spot prices pulled from our monitoring system. ★ marks the cheapest provider per GPU. The "Max Savings" column shows how much cheaper the best spot option is versus the most expensive provider.

GPU Model	AWS	GCP	Azure	Lambda	CoreWeave	RunPod	Vast.ai	Max Savings
NVIDIA H200141GB	—	—	—	—	—	★ $0.50/hr	$1.97/hr	Save 75%
NVIDIA H100 PCIe80GB	—	—	—	—	—	★ $1.99/hr	$2.13/hr	Save 7%
NVIDIA H100 SXM80GB	$57.76/hr	$26.85/hr	$4.16/hr	$2.49/hr	$2.07/hr	$2.59/hr	★ $1.47/hr	Save 97%
NVIDIA A100 40GB40GB	$8.10/hr	$8.79/hr	—	—	—	—	★ $0.68/hr	Save 92%
NVIDIA A100 PCIe80GB	—	—	—	—	—	$1.19/hr	★ $0.76/hr	Save 36%
NVIDIA A100 SXM80GB	$7.64/hr	$1.52/hr	$0.90/hr	$1.29/hr	$1.21/hr	$1.00/hr	★ $0.43/hr	Save 94%
NVIDIA L40S48GB	$1.82/hr	—	—	—	$1.82/hr	$0.79/hr	★ $0.43/hr	Save 77%
NVIDIA L4048GB	—	—	—	—	—	★ $0.69/hr	—	Save 0%

LIVE

Last updated: Jun 3, 2026, 06:33 AM UTC · Full comparison table →

Reading This Table

Prices shown are spot rates where available, or on-demand where spot is not offered. ★ = cheapest available. "Max Savings" = how much cheaper the best spot option is versus the priciest provider for that GPU. Prices update every 30 minutes.

What's Driving Prices Right Now

H100 availability remains tight on the major hyperscalers, which keeps spot prices higher than expected. GPU-native clouds (RunPod, Vast.ai) offer the steepest discounts but with less capacity guarantees. A100s have more availability and often represent the best cost/performance trade-off for training workloads that don't strictly require H100s.

L40S has emerged as a strong mid-tier option: cheaper than H100 or A100 on most providers, with 48GB VRAM that handles most fine-tuning and inference jobs. If your workload fits in 48GB, it's worth benchmarking L40S pricing before defaulting to A100.

Section 03

Provider-by-Provider Breakdown

Seven providers offer GPU spot capacity in 2026. They fall into two categories: hyperscalers (massive capacity, volatile pricing, enterprise tooling) and GPU-native clouds (purpose-built for ML workloads, more predictable pricing, less capacity).

AWS (Amazon)

p3, p4d, p4de, p5 instances

Deepest capacity pool globally
2-minute spot interruption notice
Best ecosystem integrations (S3, EFS, EKS)

Highest list price among all 7
Spot savings are smaller (40–55%)

GCP (Google Cloud)

a2, a3, g2 instances

Spot VMs are very reliable
TPU access as alternative
Strong networking for distributed training

Limited H100 spot availability
Complex networking config

Azure (Microsoft)

NCv3, NDv2, NC H100 v5

Good for teams already on Azure
Spot eviction policy controls

Pricing not consistently competitive
GPU availability varies by region

Lambda Cloud

GPU clusters, on-demand + spot

Purpose-built for ML workloads
Simple pricing, no egress fees
Good H100 and A100 availability

Smaller scale than hyperscalers
Limited regions

CoreWeave

H100, A100, L40S clusters

Best H100 spot pricing
Kubernetes-native
High-bandwidth GPU interconnects

Contract-focused (enterprise bias)
Setup complexity for small teams

RunPod

On-demand + spot pods

Lowest prices on consumer GPUs
Fast spin-up, simple API
Community cloud for cheap experiments

Variable reliability on spot pods
Less consistent SLA

Vast.ai

Marketplace model

Lowest absolute prices (marketplace)
Wide GPU variety

Quality varies by host
Not suitable for production workloads

Section 04

Best Practices for GPU Spot Instances

Spot instances are not plug-and-play replacements for on-demand. The teams that capture 70%+ savings reliably have the same core habits. Here's the playbook.

The Spot Instance Playbook

✓

Checkpoint aggressively Save model state to object storage (S3, GCS) every N steps. If your job runs 8 hours without a checkpoint and gets interrupted at hour 7, you've lost everything. 15-30 minute checkpoint intervals are the standard.

✓

Always have a fallback Set up automatic failover to on-demand if spot availability drops below a threshold. Running a mixed fleet (80% spot, 20% on-demand) gives you resilience without sacrificing most of the savings.

✓

Monitor spot prices before launching Spot prices can swing 2–3× within hours on hyperscalers. Check current pricing across all 7 providers before starting a large job — the cheapest option shifts regularly. Use our live comparison to check before you launch.

✓

Set interruption handlers On AWS, spot interruption notices land 2 minutes early as an EC2 instance metadata event. Write a handler that triggers your checkpointing logic and gracefully stops the job when this fires.

✓

Use spot for the right workloads Training runs, fine-tuning, batch evaluation, and research experiments are ideal. Real-time inference serving with SLA requirements is not — use reserved or on-demand instances there.

✓

Price alert before launch Set a budget threshold alert so you're notified if spot prices creep up during a long training run. What started at $1.20/hr can climb to $2.80/hr if demand spikes.

When On-Demand is the Right Call

Real-time inference APIs, customer-facing model serving, and jobs without checkpointing support should run on on-demand or reserved capacity. The interruption risk makes spot the wrong tool for workloads where restarts cost more than the savings. The 60-80% discount only matters if the job completes successfully.

GPU Selection Strategy

Don't default to H100 because it's the fastest chip. For most training and fine-tuning workloads, an A100 or L40S at 50–60% of the H100 spot price will complete the job at comparable total cost. Profile your workload's actual GPU utilization and memory footprint first. If you're under 40GB VRAM, an L40S or A100 40GB will work and typically has better spot availability.

For transformer training at scale, H100's HBM3 and NVLink make a real difference — that's the workload where paying the H100 premium (even spot) is justified. For inference serving or fine-tuning smaller models, the A100 is usually the sweet spot on price/performance.

Monitor GPU Spot Prices in Real-Time

Spot prices shift by the hour. RoofRun tracks H100, A100, L40S, and more across all 7 providers, updated every 30 minutes. Set alerts to notify you when prices drop below your threshold — so you launch jobs at the right time.

Start Monitoring Free → View Live Prices