GPU Price Comparison Tool
Compare cloud GPU pricing across Vast.ai, RunPod, Lambda Labs, AWS, GCP, Azure, and other providers. Find H100, A100, L40S, and RTX 4090 rates with yield score analysis and cost calculations.
Real-Time GPU Pricing
Hourly rates for H100, A100, L40S, RTX 4090, and other GPUs across decentralized marketplaces and hyperscale clouds. Dashboard refreshes from provider APIs automatically.
Infrastructure Cost Calculator
Estimate cost per 1M tokens, per training epoch, or per month based on your workload. Calculate total cost of ownership including egress, storage, and compute time.
Performance-Per-Dollar Analysis
Yield score methodology ranks GPUs by computational efficiency per dollar. Compare normalized performance indices across different GPU architectures and price points.
How to Use This Tool
Review Provider Types
Compare hyperscale, specialized, and decentralized providers below to understand availability vs. cost trade-offs
Select Your GPU Model
Use our workload framework to match GPU specs (VRAM, throughput) to your model size and inference requirements
Check Live Dashboard
Click "View Live Pricing Dashboard" above to filter by region, compare yield scores, and calculate total costs
GPU Pricing Market Snapshot (Recent Market Range)
Key Price Observations
- •H100 pricing ranges from $2.50/hr (decentralized) to $12.29/hr (AWS P5), representing a 5x spread
- •A100 80GB availability has improved, with rates stabilizing at $0.90-4.10/hr depending on provider type
- •Consumer-grade RTX 4090 instances remain the most cost-efficient for smaller models at $0.20-0.44/hr
Market Dynamics
- •Decentralized networks (Vast.ai, RunPod) expanded supply by aggregating consumer and enterprise hardware
- •Specialized providers (Lambda Labs, CoreWeave) offer middle-ground pricing with better availability than decentralized
- •Hyperscale clouds maintain premium pricing but provide enterprise SLAs and global availability
GPU Provider Price Comparison
Note: The ranges below are directional and may drift based on region, availability, host quality, and instance configuration.
Use the live dashboard for exact, current pricing before making infrastructure decisions.
Ranges are not quotes; verify pricing with the provider before purchase.
| GPU Model | Decentralized (Vast.ai, RunPod) | Specialized (Lambda, CoreWeave) | Hyperscale (AWS, GCP, Azure) | Typical Use Case |
|---|---|---|---|---|
| H100 80GB | $2.50-3.89/hr | $3.50-5.00/hr | $8.00-12.29/hr | Large-scale training (70B+ models), high-throughput production inference |
| A100 80GB | $0.90-1.89/hr | $1.10-2.20/hr | $3.20-4.10/hr | Fine-tuning mid-size models (13B-70B), batch inference, research |
| A100 40GB | $0.70-1.20/hr | $0.90-1.80/hr | $2.40-3.20/hr | Development, smaller fine-tuning tasks (7B-30B models) |
| L40S 48GB | $0.19-0.55/hr | $0.60-1.20/hr | Limited | Cost-sensitive inference, multi-modal workloads |
| RTX 4090 24GB | $0.20-0.44/hr | Limited | N/A | Development, small model inference (7B-13B), testing |
Understanding the Price-Performance Trade-off
Decentralized providers aggregate compute from consumer hardware and smaller data centers, enabling significantly lower hourly rates. Trade-offs include variable availability (commonly observed at 70-85%), potential instance interruptions, and limited enterprise support.
Specialized providers operate dedicated GPU infrastructure with institutional-grade availability (95-98%) and better networking, priced between decentralized and hyperscale options.
Hyperscale clouds provide enterprise SLAs (99.9%+ uptime), global availability, and comprehensive support at premium pricing. Optimal for production services requiring reliability guarantees.
Most users do this next: Estimate your complete AI infrastructure costs including training, inference, data labeling, and hidden operational expenses.
Calculate Your AI TCOGPU Provider Infrastructure Models
Hyperscale Cloud Providers (AWS, GCP, Azure)
Infrastructure Characteristics
- Enterprise SLAs with 99.9%+ uptime commitments and financial penalties for breaches
- Global availability across 20+ regions with consistent performance characteristics
- Purpose-built data centers optimized for GPU workloads with liquid cooling and high-bandwidth networking
Pricing and Economics
- • H100 instances: $8-12/hr on-demand, $5-8/hr with 1-3yr commitments
- • A100 instances: $3-4/hr on-demand, $2-3/hr with reserved capacity
- • Spot instances offer 50-70% discounts but with potential interruption
- • Additional costs: data egress ($0.08-0.12/GB), storage, load balancers
Optimal for: Production services requiring high reliability, compliance requirements, multi-region deployment
Specialized GPU Providers (Lambda Labs, CoreWeave, Paperspace)
Infrastructure Characteristics
- Focused GPU infrastructure with 95-98% availability but limited formal SLA guarantees
- Regional availability typically concentrated in 3-5 major data center hubs
- Purpose-built for ML workloads with optimized networking and storage configurations
Pricing and Economics
- • H100 instances: $3.50-5.00/hr with consistent availability
- • A100 instances: $1.10-2.20/hr, often with volume discounts for 10+ GPUs
- • Flat-rate pricing with fewer hidden costs than hyperscale clouds
- • Some providers offer reserved capacity at 15-25% discount
Optimal for: ML teams requiring better economics than hyperscale with more reliability than decentralized
Decentralized GPU Marketplaces (Vast.ai, RunPod)
Infrastructure Characteristics
- Aggregate supply from consumer hardware, small data centers, and individual operators
- Variable availability commonly observed at 70-85%, with potential for instance interruption or maintenance
- Heterogeneous infrastructure quality, networking speeds, and storage configurations
Pricing and Economics
- • H100 instances: Often available at 50-80% lower rates vs hyperscale (commonly $2.50-3.89/hr)
- • A100 instances: $0.90-1.89/hr with variation by host quality
- • RTX consumer GPUs: $0.20-0.44/hr for 4090, 3090, etc.
- • Transparent base pricing with often lower ancillary fees compared to hyperscale clouds
Optimal for: Development, experimentation, batch workloads, cost-sensitive applications with fault tolerance
GPU Selection Framework by Workload
Training Large Language Models (30B-70B+ Parameters)
Recommended GPU
H100 80GB or A100 80GB (multi-GPU setup for 70B+)
Estimated Cost
$2.50-12/hr per GPU × training duration (typically 100-500 GPU-hours for fine-tuning)
Provider Recommendation
Specialized (Lambda, CoreWeave) for reliability; Decentralized (Vast.ai) for cost savings if checkpointing frequently
80GB VRAM required for models above 30B parameters without advanced quantization. Multi-GPU setups with NVLink or InfiniBand needed for 70B+ models. Consider checkpoint frequency and fault tolerance when using lower-cost providers.
Production Inference for Customer-Facing Applications
Recommended GPU
H100 for high-throughput; A100 or L40S for cost-balanced inference
Estimated Cost
$0.50-2.00 per 1M tokens depending on model size and GPU choice
Provider Recommendation
Hyperscale (AWS, GCP) or Specialized (Lambda, CoreWeave) for SLA requirements
Production inference requires high availability (99.9%+), consistent latency, and autoscaling capabilities. Avoid decentralized providers unless implementing multi-provider failover. Factor in egress costs for high-volume applications.
Fine-Tuning Small to Medium Models (7B-13B Parameters)
Recommended GPU
RTX 4090, L40S, or A100 40GB
Estimated Cost
$10-100 per fine-tuning run (5-50 GPU-hours typical)
Provider Recommendation
Decentralized (Vast.ai, RunPod) optimal for cost efficiency
24-48GB VRAM sufficient for most 7B-13B model fine-tuning with LoRA or QLoRA. Decentralized providers offer substantial savings with acceptable interruption risk for non-production workloads. Implement checkpointing every 10-20% of training.
Development, Experimentation, and Testing
Recommended GPU
RTX 4090, RTX 3090, or L40S
Estimated Cost
$0.20-0.60/hr for iterative development
Provider Recommendation
Decentralized providers (Vast.ai) for maximum cost efficiency
Consumer-grade GPUs provide excellent performance for development workflows, prompt engineering, and model evaluation. Lower hourly rates enable longer experimentation cycles. Interruption risk is acceptable for non-critical development work.
Batch Processing and Offline Inference
Recommended GPU
A100 40/80GB for throughput; RTX 4090 for cost efficiency
Estimated Cost
$0.50-3.00 per 1M tokens processed depending on model and GPU
Provider Recommendation
Decentralized or spot instances for lowest cost; can tolerate interruption
Batch workloads with flexible deadlines can leverage spot pricing or decentralized providers for 60-80% cost savings. Implement job queuing and automatic retry logic to handle potential interruptions. Consider multi-provider strategies to maximize GPU availability.
Understanding GPU Yield Score Methodology
Calculation Framework
Yield Score = Performance Index / Hourly Price
Higher scores indicate more computational output per dollar spent
Performance Index Components
- •Compute throughput: TFLOPS for FP16/BF16 operations (primary metric for transformer models)
- •Memory bandwidth: GB/s for memory-bound inference workloads
- •Memory capacity: Total VRAM available for model weights and KV cache
- •Tensor Core generation: Architectural efficiency for matrix operations
Normalized Benchmarking
- •Performance indices normalized to H100 baseline (100)
- •Based on MLPerf inference benchmarks and manufacturer specifications
- •Actual performance varies by model architecture, batch size, and optimization
- •Indices reviewed periodically to reflect driver, framework, and benchmarking updates
Example 1
H100 at $2.50/hr
Score: 40.0
100 performance index ÷ $2.50 = 40
Example 2
A100 at $1.50/hr
Score: 38.0
57 performance index ÷ $1.50 = 38
Example 3
RTX 4090 at $0.20/hr
Score: 115.0
23 performance index ÷ $0.20 = 115
When Yield Score Matters (and When It Doesn't)
Yield score is most valuable when: You're running 8B-13B inference workloads where multiple GPU options (RTX 4090, L40S, A100 40GB) all meet VRAM requirements. In these cases, yield score helps identify the most cost-efficient choice—often favoring 4090/L40S.
Yield score becomes secondary when: Your model requires 80GB VRAM (narrowing options to A100 80GB or H100), you need guaranteed 99.9% uptime for production, or latency requirements demand the fastest possible GPU regardless of cost.
Practical example: For running LLaMA 70B, an H100 at $2.50/hr (yield score: 40) may actually be more cost-effective than an A100 at $1.20/hr (yield score: 47) due to H100's 2-3x higher throughput reducing total job completion time.
Important context: Yield score optimizes for cost efficiency, not absolute performance. A high yield score indicates excellent value but may not be appropriate for latency-sensitive or high-throughput production workloads. Use yield score in combination with workload requirements, SLA needs, and availability constraints.
Frequently Asked Questions
What is the cheapest GPU for AI inference?
Based on current market rates, RTX 4090 instances typically range from $0.20-0.44/hour and L40S from $0.19-0.55/hour on decentralized providers. For larger models requiring 80GB VRAM, A100 80GB pricing commonly starts around $0.90-1.20/hour on decentralized networks compared to $3-4/hour on hyperscale clouds. Actual rates vary by provider, region, and availability.
How much does an H100 GPU cost per hour?
H100 GPU hourly rates typically range from approximately $2.50-3.89/hour on decentralized providers to $8-12/hour on hyperscale clouds like AWS and GCP. Specialized providers generally price between these ranges at $3.50-5.00/hour. Price differences reflect variations in availability guarantees, support levels, and SLA commitments. Use our live dashboard for current provider-specific rates.
What is GPU yield score?
GPU yield score is a performance-per-dollar metric calculated as Performance Index / Hourly Price. Higher scores indicate more computational output per dollar. This metric helps identify cost-efficient options but should be considered alongside workload requirements, availability needs, and performance constraints.
How do decentralized providers compare to AWS, GCP, or Azure?
Decentralized GPU marketplaces often offer meaningfully lower hourly rates (sometimes 50%+ savings) than hyperscale clouds by aggregating supply from diverse sources. Trade-offs commonly include lower availability guarantees (often cited as 70-85% vs 99.9%), no enterprise SLAs, and variable network performance. Decentralized providers work well for development, batch workloads, and cost-sensitive applications, while production services requiring high reliability generally favor hyperscale or specialized providers.
What is the cost per 1M tokens for different GPUs?
Cost per 1M tokens depends on GPU throughput, hourly price, model architecture, and optimization. As an example, RTX 4090 instances running smaller models may achieve costs in the $0.50-1.00 per 1M tokens range, while H100 instances can be more efficient for larger models despite higher hourly rates due to superior throughput. Use our calculator with your specific model parameters for accurate estimates.
Is it cheaper to rent or buy GPUs for AI?
For workloads with consistent 24/7 utilization at 80%+ capacity, purchasing hardware may break even in approximately 8-12 months compared to cloud rental. However, ownership requires significant upfront capital, ongoing power and cooling costs, maintenance, and lacks flexibility for scaling or GPU type changes. Cloud rental generally proves more cost-effective for variable workloads, short-term projects, or when testing different GPU configurations.
Related AI Infrastructure Research
AI Infrastructure Investment Guide →
Comprehensive analysis of compute infrastructure economics, tokenized compute markets, and data center investment opportunities in the AI buildout.
GPU Hardware ROI Analysis (H100 vs H200 vs MI300X) →
Financial modeling for GPU capital expenditure decisions: comparing NVIDIA H100, H200, and AMD MI300X for compute infrastructure investment.
AI Infrastructure Portfolio Allocation Framework →
Strategic framework for institutional investors building exposure to AI compute infrastructure across public equities, private funds, and direct investments.
Data Center Investment Analysis Framework →
Evaluation methodology for GPU-focused data center operators: analyzing infrastructure quality, power economics, and compute density metrics.
AI Infrastructure & Compute Category →
Complete research coverage of AI compute infrastructure as an alternative asset class: market analysis, provider reviews, and investment frameworks.
Decentralized GPU Marketplace Analysis →
Deep-dive research on decentralized compute networks: market structure, pricing dynamics, and investment opportunities in distributed GPU infrastructure.
AI Infrastructure Risk Management & Hedging →
Risk analysis for compute infrastructure investments: power cost hedging, capacity planning, and portfolio construction strategies for institutional allocators.
Data Center Infrastructure Failure Case Study (CME) →
Analysis of the 2024 CME data center cooling failure: lessons for GPU infrastructure reliability, redundancy requirements, and operational risk in compute investments.
Compare Live GPU Pricing
Access real-time GPU pricing across 12+ providers, calculate workload costs with our scenario builder, and identify optimal performance-per-dollar GPUs for your infrastructure.
Launch GPU Pricing DashboardGPU price comparison tool for H100, A100, L40S, RTX 4090 pricing across cloud providers. Compare GPU rental costs, calculate inference expenses, and evaluate performance-per-dollar for AI infrastructure.
Provider coverage: Vast.ai GPU pricing, RunPod pricing comparison, Lambda Labs rates, AWS P5 instances, GCP A3 instances, Azure ND series. Workload analysis: LLaMA inference costs, fine-tuning expenses, batch processing economics.