GPU Price Comparison Tool

Compare cloud GPU pricing across Vast.ai, RunPod, Lambda Labs, AWS, GCP, Azure, and other providers. Find H100, A100, L40S, and RTX 4090 rates with yield score analysis and cost calculations.

Pricing refreshed regularly • Live dashboard shows current rates

Real-Time GPU Pricing

Hourly rates for H100, A100, L40S, RTX 4090, and other GPUs across decentralized marketplaces and hyperscale clouds. Dashboard refreshes from provider APIs automatically.

Infrastructure Cost Calculator

Estimate cost per 1M tokens, per training epoch, or per month based on your workload. Calculate total cost of ownership including egress, storage, and compute time.

Performance-Per-Dollar Analysis

Yield score methodology ranks GPUs by computational efficiency per dollar. Compare normalized performance indices across different GPU architectures and price points.

How to Use This Tool

1

Review Provider Types

Compare hyperscale, specialized, and decentralized providers below to understand availability vs. cost trade-offs

2

Select Your GPU Model

Use our workload framework to match GPU specs (VRAM, throughput) to your model size and inference requirements

3

Check Live Dashboard

Click "View Live Pricing Dashboard" above to filter by region, compare yield scores, and calculate total costs

GPU Pricing Market Snapshot (Recent Market Range)

Key Price Observations

  • H100 pricing ranges from $2.50/hr (decentralized) to $12.29/hr (AWS P5), representing a 5x spread
  • A100 80GB availability has improved, with rates stabilizing at $0.90-4.10/hr depending on provider type
  • Consumer-grade RTX 4090 instances remain the most cost-efficient for smaller models at $0.20-0.44/hr

Market Dynamics

  • Decentralized networks (Vast.ai, RunPod) expanded supply by aggregating consumer and enterprise hardware
  • Specialized providers (Lambda Labs, CoreWeave) offer middle-ground pricing with better availability than decentralized
  • Hyperscale clouds maintain premium pricing but provide enterprise SLAs and global availability

GPU Provider Price Comparison

Note: The ranges below are directional and may drift based on region, availability, host quality, and instance configuration.

Use the live dashboard for exact, current pricing before making infrastructure decisions.

Ranges are not quotes; verify pricing with the provider before purchase.

GPU ModelDecentralized
(Vast.ai, RunPod)
Specialized
(Lambda, CoreWeave)
Hyperscale
(AWS, GCP, Azure)
Typical Use Case
H100 80GB$2.50-3.89/hr$3.50-5.00/hr$8.00-12.29/hrLarge-scale training (70B+ models), high-throughput production inference
A100 80GB$0.90-1.89/hr$1.10-2.20/hr$3.20-4.10/hrFine-tuning mid-size models (13B-70B), batch inference, research
A100 40GB$0.70-1.20/hr$0.90-1.80/hr$2.40-3.20/hrDevelopment, smaller fine-tuning tasks (7B-30B models)
L40S 48GB$0.19-0.55/hr$0.60-1.20/hrLimitedCost-sensitive inference, multi-modal workloads
RTX 4090 24GB$0.20-0.44/hrLimitedN/ADevelopment, small model inference (7B-13B), testing

Understanding the Price-Performance Trade-off

Decentralized providers aggregate compute from consumer hardware and smaller data centers, enabling significantly lower hourly rates. Trade-offs include variable availability (commonly observed at 70-85%), potential instance interruptions, and limited enterprise support.

Specialized providers operate dedicated GPU infrastructure with institutional-grade availability (95-98%) and better networking, priced between decentralized and hyperscale options.

Hyperscale clouds provide enterprise SLAs (99.9%+ uptime), global availability, and comprehensive support at premium pricing. Optimal for production services requiring reliability guarantees.

Most users do this next: Estimate your complete AI infrastructure costs including training, inference, data labeling, and hidden operational expenses.

Calculate Your AI TCO

GPU Provider Infrastructure Models

Hyperscale Cloud Providers (AWS, GCP, Azure)

Infrastructure Characteristics

  • Enterprise SLAs with 99.9%+ uptime commitments and financial penalties for breaches
  • Global availability across 20+ regions with consistent performance characteristics
  • Purpose-built data centers optimized for GPU workloads with liquid cooling and high-bandwidth networking

Pricing and Economics

  • • H100 instances: $8-12/hr on-demand, $5-8/hr with 1-3yr commitments
  • • A100 instances: $3-4/hr on-demand, $2-3/hr with reserved capacity
  • • Spot instances offer 50-70% discounts but with potential interruption
  • • Additional costs: data egress ($0.08-0.12/GB), storage, load balancers

Optimal for: Production services requiring high reliability, compliance requirements, multi-region deployment

Specialized GPU Providers (Lambda Labs, CoreWeave, Paperspace)

Infrastructure Characteristics

  • Focused GPU infrastructure with 95-98% availability but limited formal SLA guarantees
  • Regional availability typically concentrated in 3-5 major data center hubs
  • Purpose-built for ML workloads with optimized networking and storage configurations

Pricing and Economics

  • • H100 instances: $3.50-5.00/hr with consistent availability
  • • A100 instances: $1.10-2.20/hr, often with volume discounts for 10+ GPUs
  • • Flat-rate pricing with fewer hidden costs than hyperscale clouds
  • • Some providers offer reserved capacity at 15-25% discount

Optimal for: ML teams requiring better economics than hyperscale with more reliability than decentralized

Decentralized GPU Marketplaces (Vast.ai, RunPod)

Infrastructure Characteristics

  • Aggregate supply from consumer hardware, small data centers, and individual operators
  • Variable availability commonly observed at 70-85%, with potential for instance interruption or maintenance
  • Heterogeneous infrastructure quality, networking speeds, and storage configurations

Pricing and Economics

  • • H100 instances: Often available at 50-80% lower rates vs hyperscale (commonly $2.50-3.89/hr)
  • • A100 instances: $0.90-1.89/hr with variation by host quality
  • • RTX consumer GPUs: $0.20-0.44/hr for 4090, 3090, etc.
  • • Transparent base pricing with often lower ancillary fees compared to hyperscale clouds

Optimal for: Development, experimentation, batch workloads, cost-sensitive applications with fault tolerance

GPU Selection Framework by Workload

Training Large Language Models (30B-70B+ Parameters)

Recommended GPU

H100 80GB or A100 80GB (multi-GPU setup for 70B+)

Estimated Cost

$2.50-12/hr per GPU × training duration (typically 100-500 GPU-hours for fine-tuning)

Provider Recommendation

Specialized (Lambda, CoreWeave) for reliability; Decentralized (Vast.ai) for cost savings if checkpointing frequently

80GB VRAM required for models above 30B parameters without advanced quantization. Multi-GPU setups with NVLink or InfiniBand needed for 70B+ models. Consider checkpoint frequency and fault tolerance when using lower-cost providers.

Production Inference for Customer-Facing Applications

Recommended GPU

H100 for high-throughput; A100 or L40S for cost-balanced inference

Estimated Cost

$0.50-2.00 per 1M tokens depending on model size and GPU choice

Provider Recommendation

Hyperscale (AWS, GCP) or Specialized (Lambda, CoreWeave) for SLA requirements

Production inference requires high availability (99.9%+), consistent latency, and autoscaling capabilities. Avoid decentralized providers unless implementing multi-provider failover. Factor in egress costs for high-volume applications.

Fine-Tuning Small to Medium Models (7B-13B Parameters)

Recommended GPU

RTX 4090, L40S, or A100 40GB

Estimated Cost

$10-100 per fine-tuning run (5-50 GPU-hours typical)

Provider Recommendation

Decentralized (Vast.ai, RunPod) optimal for cost efficiency

24-48GB VRAM sufficient for most 7B-13B model fine-tuning with LoRA or QLoRA. Decentralized providers offer substantial savings with acceptable interruption risk for non-production workloads. Implement checkpointing every 10-20% of training.

Development, Experimentation, and Testing

Recommended GPU

RTX 4090, RTX 3090, or L40S

Estimated Cost

$0.20-0.60/hr for iterative development

Provider Recommendation

Decentralized providers (Vast.ai) for maximum cost efficiency

Consumer-grade GPUs provide excellent performance for development workflows, prompt engineering, and model evaluation. Lower hourly rates enable longer experimentation cycles. Interruption risk is acceptable for non-critical development work.

Batch Processing and Offline Inference

Recommended GPU

A100 40/80GB for throughput; RTX 4090 for cost efficiency

Estimated Cost

$0.50-3.00 per 1M tokens processed depending on model and GPU

Provider Recommendation

Decentralized or spot instances for lowest cost; can tolerate interruption

Batch workloads with flexible deadlines can leverage spot pricing or decentralized providers for 60-80% cost savings. Implement job queuing and automatic retry logic to handle potential interruptions. Consider multi-provider strategies to maximize GPU availability.

Understanding GPU Yield Score Methodology

Calculation Framework

Yield Score = Performance Index / Hourly Price

Higher scores indicate more computational output per dollar spent

Performance Index Components

  • Compute throughput: TFLOPS for FP16/BF16 operations (primary metric for transformer models)
  • Memory bandwidth: GB/s for memory-bound inference workloads
  • Memory capacity: Total VRAM available for model weights and KV cache
  • Tensor Core generation: Architectural efficiency for matrix operations

Normalized Benchmarking

  • Performance indices normalized to H100 baseline (100)
  • Based on MLPerf inference benchmarks and manufacturer specifications
  • Actual performance varies by model architecture, batch size, and optimization
  • Indices reviewed periodically to reflect driver, framework, and benchmarking updates

Example 1

H100 at $2.50/hr

Score: 40.0

100 performance index ÷ $2.50 = 40

Example 2

A100 at $1.50/hr

Score: 38.0

57 performance index ÷ $1.50 = 38

Example 3

RTX 4090 at $0.20/hr

Score: 115.0

23 performance index ÷ $0.20 = 115

When Yield Score Matters (and When It Doesn't)

Yield score is most valuable when: You're running 8B-13B inference workloads where multiple GPU options (RTX 4090, L40S, A100 40GB) all meet VRAM requirements. In these cases, yield score helps identify the most cost-efficient choice—often favoring 4090/L40S.

Yield score becomes secondary when: Your model requires 80GB VRAM (narrowing options to A100 80GB or H100), you need guaranteed 99.9% uptime for production, or latency requirements demand the fastest possible GPU regardless of cost.

Practical example: For running LLaMA 70B, an H100 at $2.50/hr (yield score: 40) may actually be more cost-effective than an A100 at $1.20/hr (yield score: 47) due to H100's 2-3x higher throughput reducing total job completion time.

Important context: Yield score optimizes for cost efficiency, not absolute performance. A high yield score indicates excellent value but may not be appropriate for latency-sensitive or high-throughput production workloads. Use yield score in combination with workload requirements, SLA needs, and availability constraints.

Frequently Asked Questions

What is the cheapest GPU for AI inference?

Based on current market rates, RTX 4090 instances typically range from $0.20-0.44/hour and L40S from $0.19-0.55/hour on decentralized providers. For larger models requiring 80GB VRAM, A100 80GB pricing commonly starts around $0.90-1.20/hour on decentralized networks compared to $3-4/hour on hyperscale clouds. Actual rates vary by provider, region, and availability.

How much does an H100 GPU cost per hour?

H100 GPU hourly rates typically range from approximately $2.50-3.89/hour on decentralized providers to $8-12/hour on hyperscale clouds like AWS and GCP. Specialized providers generally price between these ranges at $3.50-5.00/hour. Price differences reflect variations in availability guarantees, support levels, and SLA commitments. Use our live dashboard for current provider-specific rates.

What is GPU yield score?

GPU yield score is a performance-per-dollar metric calculated as Performance Index / Hourly Price. Higher scores indicate more computational output per dollar. This metric helps identify cost-efficient options but should be considered alongside workload requirements, availability needs, and performance constraints.

How do decentralized providers compare to AWS, GCP, or Azure?

Decentralized GPU marketplaces often offer meaningfully lower hourly rates (sometimes 50%+ savings) than hyperscale clouds by aggregating supply from diverse sources. Trade-offs commonly include lower availability guarantees (often cited as 70-85% vs 99.9%), no enterprise SLAs, and variable network performance. Decentralized providers work well for development, batch workloads, and cost-sensitive applications, while production services requiring high reliability generally favor hyperscale or specialized providers.

What is the cost per 1M tokens for different GPUs?

Cost per 1M tokens depends on GPU throughput, hourly price, model architecture, and optimization. As an example, RTX 4090 instances running smaller models may achieve costs in the $0.50-1.00 per 1M tokens range, while H100 instances can be more efficient for larger models despite higher hourly rates due to superior throughput. Use our calculator with your specific model parameters for accurate estimates.

Is it cheaper to rent or buy GPUs for AI?

For workloads with consistent 24/7 utilization at 80%+ capacity, purchasing hardware may break even in approximately 8-12 months compared to cloud rental. However, ownership requires significant upfront capital, ongoing power and cooling costs, maintenance, and lacks flexibility for scaling or GPU type changes. Cloud rental generally proves more cost-effective for variable workloads, short-term projects, or when testing different GPU configurations.

Related AI Infrastructure Research

AI Infrastructure Investment Guide →

Comprehensive analysis of compute infrastructure economics, tokenized compute markets, and data center investment opportunities in the AI buildout.

GPU Hardware ROI Analysis (H100 vs H200 vs MI300X) →

Financial modeling for GPU capital expenditure decisions: comparing NVIDIA H100, H200, and AMD MI300X for compute infrastructure investment.

AI Infrastructure Portfolio Allocation Framework →

Strategic framework for institutional investors building exposure to AI compute infrastructure across public equities, private funds, and direct investments.

Data Center Investment Analysis Framework →

Evaluation methodology for GPU-focused data center operators: analyzing infrastructure quality, power economics, and compute density metrics.

AI Infrastructure & Compute Category →

Complete research coverage of AI compute infrastructure as an alternative asset class: market analysis, provider reviews, and investment frameworks.

Decentralized GPU Marketplace Analysis →

Deep-dive research on decentralized compute networks: market structure, pricing dynamics, and investment opportunities in distributed GPU infrastructure.

AI Infrastructure Risk Management & Hedging →

Risk analysis for compute infrastructure investments: power cost hedging, capacity planning, and portfolio construction strategies for institutional allocators.

Data Center Infrastructure Failure Case Study (CME) →

Analysis of the 2024 CME data center cooling failure: lessons for GPU infrastructure reliability, redundancy requirements, and operational risk in compute investments.

Compare Live GPU Pricing

Access real-time GPU pricing across 12+ providers, calculate workload costs with our scenario builder, and identify optimal performance-per-dollar GPUs for your infrastructure.

Launch GPU Pricing Dashboard

GPU price comparison tool for H100, A100, L40S, RTX 4090 pricing across cloud providers. Compare GPU rental costs, calculate inference expenses, and evaluate performance-per-dollar for AI infrastructure.

Provider coverage: Vast.ai GPU pricing, RunPod pricing comparison, Lambda Labs rates, AWS P5 instances, GCP A3 instances, Azure ND series. Workload analysis: LLaMA inference costs, fine-tuning expenses, batch processing economics.