Picking the wrong GPU for your training job is an expensive mistake. The gap between the best and worst choice for a given workload can easily be 10ร the cost per hour โ and if you're running hundreds of GPU-hours per week, that's real money. This guide uses 30-day spot pricing data from 7 cloud providers to help you match the right GPU to your training task based on what actually matters: VRAM, compute throughput, and $/TFLOP.
The GPU market has fractured. You no longer simply pick "the best GPU" โ you pick the right GPU for your workload, your budget, and your interruption tolerance. An H100 costs $46/hr on AWS but $1.47/hr on Vast.ai. An A100 that's $1.21/hr on CoreWeave can be $0.43/hr on Vast.ai on any given day. That variance changes which GPU is actually the right choice.
Three specs determine GPU suitability for ML training: VRAM (determines max batch size and model size), FP16 TFLOPS (raw throughput for matrix operations), and interconnect bandwidth (NVLink/NVSwitch for multi-GPU scaling). Price matters, but it's the last of these three to evaluate.
The three questions to answer first: (1) What's your model's parameter count? (2) Can your workload tolerate spot interruptions? (3) Do you need multi-GPU scaling? The answers narrow your GPU tier immediately. Everything below flows from those three inputs.
The table below shows 30-day average spot prices per GPU per provider. Use these to see where the actual price floor is โ not the list price, but what you're likely to pay on a good day.
Best $/TFLOP for fine-tuning, small models, and experiments
| Provider | Current spot | 30d avg | 30d range | Stability | Value |
|---|---|---|---|---|---|
| Vast.ai | $0.13/hr | $0.13/hr | $0.01 โ $0.27/hr | โ โ โ Very stable | โ โ โ โ Best |
| RunPod | $0.34/hr | $0.34/hr | $0.34 โ $0.34/hr | โ โ โ Very stable | โ โ โ โ Best |
Source: RoofRun price_snapshots, 30-day window ending 2026-06-03. Raw data via API.
The sweet spot for most production ML training workloads
| Provider | Current spot | 30d avg | 30d range | Stability | Value |
|---|---|---|---|---|---|
| Vast.ai | $0.43/hr | $0.43/hr | $0.43 โ $0.85/hr | โ โ Stable | โ โ โ โ Best |
| RunPod | $1.00/hr | $1.02/hr | $1.00 โ $1.39/hr | โ โ Stable | โ โ โ โ Best |
| CoreWeave | $1.21/hr | $1.21/hr | $1.19 โ $1.23/hr | โ โ โ Very stable | โ โ โ โ Best |
| Lambda Cloud | $1.29/hr | $1.29/hr | $1.28 โ $1.30/hr | โ โ โ Very stable | โ โ โ โ Best |
| Google Cloud Platform | $1.52/hr | $1.50/hr | $1.46 โ $1.55/hr | โ โ โ Very stable | โ โ โ โ Best |
| Microsoft Azure | $1.83/hr | $3.24/hr | $0.86 โ $7.67/hr | โ Volatile | โ โ Fair |
| Amazon Web Services | $5.73/hr | $11.54/hr | $5.73 โ $24.48/hr | โ Volatile | โ Poor |
Source: RoofRun price_snapshots, 30-day window ending 2026-06-03. Raw data via API.
| Provider | Current spot | 30d avg | 30d range | Stability | Value |
|---|---|---|---|---|---|
| Vast.ai | $0.43/hr | $0.62/hr | $0.43 โ $0.85/hr | โ Variable | โ โ โ โ Best |
| RunPod | $0.79/hr | $0.79/hr | $0.79 โ $0.79/hr | โ โ โ Very stable | โ โ โ โ Best |
| CoreWeave | $1.82/hr | $1.84/hr | $1.80 โ $1.88/hr | โ โ โ Very stable | โ โ Fair |
Source: RoofRun price_snapshots, 30-day window ending 2026-06-03. Raw data via API.
Large model pretraining, multi-GPU training, frontier research
| Provider | Current spot | 30d avg | 30d range | Stability | Value |
|---|---|---|---|---|---|
| Vast.ai | $1.47/hr | $1.53/hr | $1.40 โ $6.13/hr | โ Variable | โ โ โ โ Best |
| CoreWeave | $2.07/hr | $2.06/hr | $2.02 โ $2.10/hr | โ โ โ Very stable | โ โ โ โ Best |
| Lambda Cloud | $2.49/hr | $2.49/hr | $2.47 โ $2.51/hr | โ โ โ Very stable | โ โ โ Good |
| RunPod | $2.59/hr | $2.59/hr | $2.59 โ $2.59/hr | โ โ โ Very stable | โ โ โ Good |
| Microsoft Azure | $2.16/hr | $3.19/hr | $2.04 โ $4.43/hr | โ Variable | โ โ โ Good |
| Google Cloud Platform | $26.85/hr | $27.43/hr | $25.75 โ $30.08/hr | โ โ โ Very stable | โ Poor |
| Amazon Web Services | $57.76/hr | $46.22/hr | $31.30 โ $60.12/hr | โ Variable | โ Poor |
Source: RoofRun price_snapshots, 30-day window ending 2026-06-03. Raw data via API.
Price is one variable; availability, stability, and setup complexity are others. Here's the practical recommendation per use case based on 30-day data patterns:
| Use Case | Recommended GPU | Best Providers | Typical Spot Range | Notes |
|---|---|---|---|---|
| Fine-tuning (7Bโ30B models) | A100 80GB | RunPod, Vast.ai | $1.00โ$1.22/hr | Checkpoints every 100โ500 steps; spot-friendly. |
| Pretraining (70B+ models) | H100 SXM | CoreWeave, Vast.ai | $1.47โ$2.49/hr | Multi-node, NVLink critical. Reserve via CoreWeave for consistency. |
| RLHF / Reward Modeling | H100 or A100 | Lambda Cloud, RunPod | $1.28โ$2.49/hr | High GPU-memory. Spot works with checkpointing. |
| Inference (batch) | L40S, A100 | Vast.ai, RunPod | $0.13โ$0.79/hr | Batch inference tolerates spot interruptions well. |
| Experiments / Dev / Finetuning small | RTX 4090 | Vast.ai, RunPod | $0.01โ$0.34/hr | Cheapest path for non-critical dev workloads. |
| Enterprise / SLA-required | A100 or H100 | CoreWeave, GCP | $1.21โ$2.06/hr | Higher price, but stability and support justify premium. |
Price per hour is not the whole story. A GPU that costs twice as much but delivers 3ร the throughput has better $/TFLOP โ the metric that actually measures your cost efficiency per unit of compute. Here's how the tier compares:
$/TFLOP reality check: At current Vast.ai spot prices, the RTX 4090 achieves $0.0015 per FP16 TFLOPS-hour. The H100 on CoreWeave comes in at $0.70 per FP16 TFLOPS-hour. The absolute $/TFLOP winner depends heavily on which provider you access โ and your interruption tolerance. At full on-demand pricing the picture flips, which is why we always recommend spot-first with checkpointing.
save_checkpoint, DeepSpeed, etc.)Hybrid strategy: Run your training on spot with aggressive checkpointing (save every 100โ500 steps depending on step duration). Keep a single on-demand or reserved instance for your primary model serving endpoint. This gives you spot economics for training while protecting your production inference SLA.
For large-scale pretraining runs where every hour of wall time translates to days of schedule delay, reserved capacity on CoreWeave or Lambda Cloud (1.2โ2.6ร cheaper than AWS/GCP on H100) is worth the premium. The consistency of pricing matters more than the absolute floor when you're burning 100+ GPUs per run.
If you're doing multi-GPU training, interconnect matters enormously. PCIe bandwidth between GPUs creates a bottleneck that NVLink eliminates. Only specialist providers (CoreWeave, Lambda Cloud, and the hyperscalers) offer NVLink-connected H100 and A100 nodes. Vast.ai and RunPod primarily offer PCIe-connected GPUs.
For pretraining 70B+ models, you need NVLink or NVSwitch to maintain efficient tensor parallelism across 8+ GPUs. The NVLink-connected tier on CoreWeave (8รH100 SXM, ~$16โ$20/hr) is the right choice. Trying to replicate this on PCIe-connected GPUs will produce 30โ50% lower effective throughput due to inter-GPU communication overhead.
For fine-tuning 7Bโ30B models, a single A100 or H100 with 80GB VRAM is sufficient for most use cases. Multi-GPU fine-tuning (FSDP, DeepSpeed ZeRO-3) is only necessary when the model doesn't fit in a single 80GB card โ which is becoming rarer as quantization techniques improve.
Real-time spot pricing, 30-day trends, and price alerts for H100, A100, L40S, RTX 4090 and more.
Open Live Dashboard โAll pricing data in this article comes from RoofRun's continuous polling of provider APIs and pricing pages every 30 minutes. Prices shown are spot instance rates โ on-demand pricing is typically 2โ4ร higher. The "current spot" column reflects the most recent available snapshot per GPU/provider; the "30-day avg" column is the mean across the trailing 30 days. GPU TFLOPS are NVIDIA-published FP16 Tensor Core specs. $/TFLOP calculated as average spot price divided by FP16 TFLOPS. Raw data available via the public JSON API.