The Cloud Repatriation Debate: When AWS Costs 10-100x More Than It Should [2025 Platform Engineering Guide]

November 5, 2025 · 30 min read

Quick Answer (TL;DR)

Problem: AWS and major cloud providers charge 7-18x markups on compute, with hidden egress fees and unpredictable billing driving total cost 10-100x higher than bare metal alternatives for many workloads.

Movement: 86% of CIOs planned cloud repatriation in 2025 (up from 43% in 2020), but only 8-9% pursue full exit—most do selective workload optimization.

Real Savings:

37signals: $2M/year saved, $10M projected over 5 years
Dropbox: $74.6M saved over 2 years moving 90% of data
Typical enterprises: 15-30% infrastructure cost reduction

Key Trade-offs:

Bare Metal Wins: Predictable workloads at scale (>50 servers, >12 months stable), high bandwidth needs, limited managed service usage
Cloud Wins: True burst scaling requirements, pre-PMF startups, global multi-region presence, heavy managed service reliance, compliance constraints

Timeline: Break-even at 50-100 sustained servers or 12-24 months of stable usage patterns.

Decision Framework: Evaluate actual elasticity needs, egress costs, managed service dependency, team expertise, and total cost of ownership—not just sticker price.

Full analysis below ↓

🎙️ Listen to the podcast episode: The Cloud Repatriation Debate - Jordan and Alex discuss real companies saving millions by leaving the cloud, expose hidden costs like egress fees and NAT gateways, and debate when cloud makes sense versus when it's "highway robbery."

🎥 Watch on YouTube

The Repatriation Wave Reaches Critical Mass

David Heinemeier Hansson (DHH) from 37signals stood before a $3.2 million annual AWS bill in 2022 and asked the question that would spark a movement: "Why are we paying this much?"

By 2024, 37signals had completely exited AWS. Their new bill: $1.3 million annually—saving nearly $2 million per year, with projections exceeding $10 million in savings over five years.

They're not alone. 86% of CIOs planned to repatriate at least some cloud workloads in 2025—the highest on record in Barclays' CIO Survey, nearly double the 43% who said the same in late 2020.

But here's what the headlines don't tell you: this isn't a wholesale rejection of cloud computing. Only 8-9% of companies plan full workload repatriation. The real story is nuanced, complex, and far more interesting than "cloud bad, bare metal good."

Let's examine the economics, break down the 10-100x markup claim with real data, and build decision frameworks platform engineering teams can actually use.

Key Statistics (2024-2025 Data)

Statistic	Value	Source	Context
CIOs planning repatriation	86% in 2025 (vs 43% in 2020)	Barclays CIO Survey 2024	Highest on record, but most selective not wholesale
Full workload repatriation	Only 8-9% of companies	IDC 2024	Most do selective optimization, not complete exit
Actual workloads repatriated	21% of workloads and data	IDC Server and Storage Survey	Net cloud growth continues despite exits
AWS compute markup	7-18x vs bare metal	Hetzner vs AWS pricing comparison	80-core server: €190/mo vs $2,500-$3,500/mo
37signals savings	$2M annually, $10M+ over 5 years	37signals public disclosure	$700K hardware investment, existing team
Dropbox savings	$74.6M over 2 years	Dropbox S-1 filing 2018	Moved 90% of data, custom infrastructure
AWS egress pricing	$0.09/GB after 100GB free	AWS pricing 2025	Increased free tier from 1GB but still major cost
Hetzner included traffic	20TB free with entry plans	Hetzner pricing	No egress charges vs AWS $900 for 10TB transfer
Cloud waste	15-30% of cloud spend	FinOps Foundation 2024	Idle resources, oversized instances, unused services
Break-even point	50-100 sustained servers	Industry analysis	Or 12-24 months stable workload patterns
Infrastructure team cost	$2,500-$6,000/month overhead	10-20 hrs/week at $60-75/hr	Hidden cost of bare metal management
Typical ROI threshold	40-60% cost reduction at scale	Multiple case studies	Above 200 servers with predictable usage

The 10-100x Markup Reality

The Bold Claim

Víctor Martínez's viral article made a provocative assertion: cloud providers charge 10-100x what infrastructure actually costs. Is this hyperbole or documented reality?

Let's break down the numbers with specific examples.

Real Cost Comparison: AWS vs Hetzner (Bare Metal)

Equivalent 80-Core Configuration:

Provider	Specs	Monthly Cost	Markup vs Bare Metal
Hetzner Bare Metal	80 cores, dedicated hardware	€190 (~$207)	Baseline (1x)
AWS EC2 On-Demand	Comparable C5/C6 instances	$2,500-$3,500	13-18x
AWS Reserved (3-year)	Same instances, 3-year commit	~$1,300/mo + $46K upfront	7x (plus lock-in)

Reality Check: The 7-18x markup claim is verifiable and conservative for compute-intensive workloads.

💡 Key Takeaway

AWS compute costs 7-18x more than bare metal for equivalent resources. Reserved instances reduce this to 7x but require $46K upfront and 3-year commitment. The markup is real, not marketing hype.

VPS Middle Ground: AWS vs DigitalOcean vs Hetzner

For teams not ready for bare metal, VPS providers offer middle ground:

Provider	8-core, 32GB RAM	Monthly Cost	Bandwidth Included	Egress Overage
Hetzner VPS	8 vCPU, 32GB RAM, 160GB SSD	€50 (~$55)	20TB	Free beyond 20TB
DigitalOcean	8 vCPU, 32GB RAM, 200GB SSD	~$120	Shared pool	$0.01/GB
AWS EC2	m5.2xlarge (8 vCPU, 32GB)	~$280/mo	100GB	$0.09/GB after 100GB

Markup Analysis:

AWS costs 5x more than Hetzner for equivalent VPS
DigitalOcean sits at 2.2x Hetzner pricing
Bandwidth costs can double or triple total bills for data-heavy workloads

The Egress Fee Multiplier

Compute markup is only half the story. Egress fees can turn a manageable cloud bill into a financial nightmare.

Scenario: 10TB Monthly Data Transfer

Provider	Egress Cost Formula	Monthly Egress Cost	Annual Impact
AWS	100GB free + 9,900GB × $0.09	$891/month	$10,692/year
DigitalOcean	Shared pool + $0.01/GB overage	~$100/month	$1,200/year
Hetzner	20TB included free	$0	$0

For a workload transferring 10TB/month, AWS charges $10,692 annually just for bandwidth that Hetzner includes free.

💡 Key Takeaway

Egress fees are the "hidden tax" that pushes cloud costs from 7-18x markup to 10-100x total cost of ownership for data-intensive workloads. A single 10TB/month transfer costs $10,692/year on AWS versus $0 on Hetzner.

Storage: The S3 Lock-In Tax

Scenario: 18 Petabytes of Storage (37signals' use case)

Provider	Storage Model	Annual Cost	Egress Penalty
AWS S3	$0.023/GB storage + egress	~$1.5M/year	+$0.09/GB out
Pure Storage (on-prem)	Upfront hardware + maintenance	Hardware CAPEX	No egress fees

37signals' S3 bill alone: $1.5 million annually. AWS waived $250K in egress fees for their migration—a telling admission that exit costs are punitive.

The 100x Cases: Where It Actually Happens

The 100x markup isn't typical, but it does occur in specific scenarios:

100x Markup Scenarios:

Managed Service Abuse: AWS Elasticsearch vs self-hosted on bare metal (markup 40-80x)
Function-as-a-Service Overuse: Lambda costs vs containerized equivalents on owned hardware (50-100x at scale)
Serverless Databases: Aurora/DynamoDB vs PostgreSQL on bare metal for predictable workloads (30-80x)
Data Transfer Heavy: Video streaming/CDN workloads with high egress (10-100x when egress dominates)

Where Markup is Lower (2-5x):

Managed Kubernetes (EKS) vs self-hosted k8s on VPS
Basic compute instances vs VPS (especially with reserved instances)
Blob storage vs cold storage arrays (when egress is minimal)

The Counterargument: What You're Actually Paying For

Cloud defenders argue the markup pays for:

Elasticity: Scale from 10 to 10,000 servers in minutes
Managed Services: RDS, Lambda, Kinesis—services that require expertise to build
Global Footprint: 30+ regions, edge locations, compliance certifications
Operational Burden: No hardware failures, no datacenter contracts, no capacity planning
Innovation Velocity: New services quarterly, no infrastructure blocking product development

These are legitimate value propositions—for workloads that actually need them.

The problem? Most workloads don't need them, yet pay the 10-100x markup anyway.

The Case Studies: Who Left and What They Saved

37signals: The Poster Child of Cloud Exit

Timeline:

2022: DHH discovers $3.2M annual AWS bill, begins exit planning
2023: Migrates compute workloads, saves $1M in first year
2024: First "clean year" post-migration, saves $2M annually
2025: Completes S3 exit (18 petabytes), deletes AWS account entirely

Investment vs Savings:

Hardware purchase: $700K (Dell systems)
Annual savings: $2M/year
Payback period: 4.2 months
5-year projection: $10M+ saved

Team Impact:

Zero team expansion required
Same DevOps team manages infrastructure
"No hidden dragons" of operational burden

Key Quote (DHH):

"We've been pleasantly surprised that savings have been even better than originally estimated. The team managing everything is still the same—there were no hidden workloads that required us to balloon the team."

What 37signals Got Right:

Workload Fit: Predictable SaaS traffic patterns, not spiky consumer apps
Minimal Managed Services: Limited AWS service lock-in beyond compute/storage
Existing Expertise: Team already capable of infrastructure management
Executive Commitment: DHH personally championed the migration
Data-Driven: Measured costs rigorously before and after

💡 Key Takeaway

37signals' success hinged on predictable workloads, minimal managed service dependency, existing infrastructure expertise, and executive commitment. They paid back $700K hardware investment in 4.2 months and project $10M+ savings over 5 years.

Dropbox: The $75 Million Migration

Timeline:

2015: Begins "Infrastructure Optimization" project
2016: Moves 90% of user data off AWS to custom "Magic Pocket" infrastructure
2017-2018: Completes migration, goes public

Financial Impact:

2016 savings: $39.5M decrease in infrastructure costs
2017 savings: $35.1M decrease in infrastructure costs
Total 2-year savings: $74.6M
Hardware investment: $53M+
Gross margin improvement: 33% to 67% (2015-2017)

Infrastructure Details:

Three colocated datacenters: California, Virginia, Texas
Custom-built "Magic Pocket" storage system
Custom hardware and software design
Retained 10% of workloads on AWS for flexibility

Critical Context: Dropbox is an outlier, not a template. Their entire business model is storage—they were competing with AWS's core service. Most companies don't have:

Deep storage engineering expertise
Scale to justify $53M+ hardware investment
Business model centered on infrastructure efficiency

What Made Dropbox Special:

Competing directly with AWS S3 (negative margin paying AWS)
Sufficient scale (~500M users) to justify custom datacenter investment
World-class infrastructure engineering team
Storage workload perfectly suited to owned hardware

💡 Key Takeaway

Dropbox saved $74.6M over 2 years but invested $53M+ in custom infrastructure with world-class engineering teams. They're an outlier—most companies lack the scale, expertise, or business model alignment to replicate this success.

GEICO: The Repatriation Regret Story

The Cautionary Tale:

Spent a decade migrating 600+ applications to public cloud
Cloud costs increased 2.5x after migration
Now repatriating workloads to private cloud (OpenStack/Kubernetes)
Investing in on-premises infrastructure to optimize costs

What Went Wrong:

Lift-and-Shift Mentality: Moved apps without redesign for cloud efficiency
Lack of Cost Visibility: Didn't monitor per-app cloud costs during migration
Over-reliance on Managed Services: Locked into expensive AWS-specific services
No FinOps Practice: No cost optimization culture during migration phase

The Lesson: A bad cloud migration is worse than staying on-premises. GEICO's story isn't "cloud failed"—it's "unoptimized cloud migration failed."

The 86% Who Aren't Making Headlines

While 37signals and Dropbox dominate headlines, 86% of CIOs planned repatriation in 2025, but most aren't full exits:

Typical Repatriation Patterns:

Selective Workload Optimization: Move predictable baseline workloads to bare metal, keep burst capacity in cloud
Cold Storage Exit: Archive data to cheaper on-prem/colocation storage, keep hot data in cloud
Dev/Test Environment Repatriation: Move non-production environments to Hetzner/DigitalOcean
Database Repatriation: Self-host PostgreSQL/MySQL on bare metal, keep stateless compute in cloud
Hybrid Cloud Architecture: Strategic placement of workloads based on economics, not ideology

Why Partial Repatriation Dominates:

Lower risk than full migration
Preserves cloud benefits for workloads that need them
Incremental cost savings without operational upheaval
Easier to justify to risk-averse executives

The Decision Framework: When Cloud Makes Sense vs When It Doesn't

The Core Question

The cloud vs bare metal debate isn't "which is better?"—it's "which workloads belong where?"

Platform engineering teams need frameworks to answer this question without ideology.

Decision Matrix: Cloud vs Bare Metal

Factor	Stay in Cloud	Consider Bare Metal/VPS	Weight
Workload Elasticity	Traffic spikes 10x+ within hours	Predictable baseline with <2x variation	⭐⭐⭐ Critical
Scale	<50 sustained servers	>100 sustained servers, stable 12+ months	⭐⭐⭐ Critical
Managed Service Dependency	Heavy use of RDS, Lambda, managed k8s	Primarily compute/storage, self-managed services	⭐⭐⭐ Critical
Geographic Distribution	Multi-region presence required	Single region or 2-3 strategic locations	⭐⭐ High
Team Expertise	No infrastructure specialists on staff	3+ engineers with datacenter/bare metal experience	⭐⭐ High
Egress Requirements	Low data transfer (<1TB/month)	High bandwidth (>10TB/month outbound)	⭐⭐ High
Compliance Needs	Require SOC2/HIPAA/FedRAMP certifications	Standard security, no specialized compliance	⭐ Medium
Growth Stage	Pre-PMF, unpredictable growth trajectory	Post-PMF, predictable growth patterns	⭐ Medium
Capital Availability	Cannot commit $100K+ upfront for hardware	Can invest 6-12 months OpEx upfront for CAPEX	⭐ Medium

The "Stay in Cloud" Profile

You should probably stay in AWS/GCP/Azure if:

Startup, Pre-Product Market Fit
- Unpredictable scaling needs
- Team focused on product, not infrastructure
- Runway concerns make OpEx flexibility critical
- May pivot or fail—don't want hardware commitments
True Burst Scaling Requirements
- Black Friday-like traffic spikes (10-100x)
- Event-driven workloads (launches, campaigns)
- Geographic traffic shifting (follow-the-sun)
- Need to scale from 50 to 5,000 instances in minutes
Heavy Managed Service Users
- Core business logic runs on Lambda, Step Functions
- Rely on managed ML services (SageMaker, Vertex AI)
- Use proprietary services (DynamoDB, Aurora Serverless)
- Migration would require app rewrites (6-12 month distraction)
Global, Compliance-Heavy Operations
- Need presence in 10+ geographic regions
- Require specific compliance certifications (FedRAMP, HIPAA)
- Customers demand cloud-native architectures
- Multi-cloud strategy for risk management
Small Engineering Teams (<100 total engineers)
- No dedicated infrastructure specialists
- Platform team smaller than 3 FTE
- Limited ops expertise in-house
- Can't spare 10-20 hours/week for infrastructure management

Example Profile: E-commerce startup (Series A)

30 engineers, 20 in product
Traffic varies 3-50x (normal vs sales events)
Heavy AWS managed services (RDS, Lambda, S3)
Global customer base (US, EU, APAC)
Verdict: Stay in cloud (markup justified by elasticity needs)

The "Consider Repatriation" Profile

You should evaluate bare metal/VPS alternatives if:

Predictable SaaS or Internal Tooling
- Traffic patterns stable within 2x variation
- 50-100+ sustained baseline servers
- 12+ months of workload history showing consistency
- Capacity planning is feasible
High Bandwidth Requirements
- Egress costs >$5K/month
- Video streaming, file transfer, CDN origin
- Data analytics with frequent large exports
- Backup/DR with multi-TB daily transfers
Minimal Managed Service Lock-in
- Primarily compute and block storage users
- Self-host databases (PostgreSQL, MySQL, Redis)
- Kubernetes workloads (portable across infrastructure)
- Use open-source tools (Prometheus, Grafana, etc.)
Mature Engineering Organization
- 100+ total engineers
- Dedicated infrastructure/platform team (3+ FTE)
- Existing ops expertise (Linux, networking, storage)
- Culture of infrastructure ownership
Cost Optimization Mandate
- Cloud bill >$500K annually with 70%+ fixed workloads
- Executive pressure to reduce infrastructure spend
- Comfortable with 6-12 month migration timeline
- Can commit $100K-$1M upfront for hardware/migration

Example Profile: Mature B2B SaaS (Series C+)

200 engineers, 12-person platform team
150 sustained EC2 instances, spikes to 180 in peak hours
Self-hosted PostgreSQL, Kafka, Redis on EC2
$1.2M annual AWS bill, mostly compute and egress
Verdict: Strong repatriation candidate (save 40-60% = $480-720K/year)

The Hybrid Approach: Best of Both Worlds

Most sophisticated platform teams don't choose cloud XOR bare metal—they architect hybrid systems optimizing cost and capability.

Hybrid Architecture Pattern:

Baseline Workload → Bare Metal/Colocation
- Predictable compute: self-hosted k8s cluster on Hetzner bare metal
- Databases: PostgreSQL on dedicated servers (high I/O performance)
- Object storage: MinIO or Ceph on-prem for bulk storage
- Cost: 40-60% lower than cloud equivalent
Burst Capacity → Cloud
- Auto-scaling application tier in AWS (scale 1.5-10x on demand)
- Spot instances for batch processing
- Lambda for event-driven workloads
- Cost: Pay cloud premium only for actual burst usage
Managed Services → Cloud (Selectively)
- Keep high-value managed services (e.g., Route53, CloudFront CDN)
- Use cloud for capabilities you can't replicate (e.g., SageMaker for ML)
- Avoid managed services you can self-host efficiently (RDS → self-hosted PostgreSQL)
DR/Backup → Cheap Cloud Storage
- Glacier/Deep Archive for cold backups
- Cross-region replication for critical data
- Geographic diversity without operating multiple datacenters

Real-World Hybrid Example:

Company: Mid-size SaaS, 500 employees, 200 engineers

Before (all AWS): $2.4M/year
After (hybrid):
- Bare metal (Hetzner): 100 servers baseline = $600K/year
- AWS burst capacity: 20-50 instances on-demand = $400K/year
- AWS managed services: Route53, CloudFront, S3 (hot data) = $300K/year
- Total: $1.3M/year
Savings: $1.1M annually (46% reduction)
Team overhead: +10 hours/week infrastructure management

💡 Key Takeaway

The optimal architecture isn't cloud OR bare metal—it's strategically hybrid. Place predictable baseline workloads on owned infrastructure (40-60% savings), reserve cloud for burst capacity and high-value managed services. Most mature platforms save 30-50% with hybrid approaches.

The Hidden Costs Nobody Talks About

Cloud's Hidden Costs

The sticker shock of AWS bills is obvious. Less obvious: the hidden costs that inflate true cloud spending.

1. Egress Fees (The Bandwidth Tax)

Often 20-40% of total bill for data-heavy workloads
Difficult to predict or forecast accurately
Creates lock-in (expensive to move data out)
Impact: $0.09/GB × 10TB/month = $10,692/year

2. Cross-AZ and Cross-Region Traffic

$0.01/GB for data between availability zones
$0.02/GB for cross-region transfers
Chatty microservices architecture can generate massive internal traffic
Impact: High-frequency trading firm spent $40K/month on internal AWS traffic

3. NAT Gateway and Network Costs

NAT Gateway: $0.045/hour + $0.045/GB processed = ~$40/month + data charges
Transit Gateway: $0.05/hour + $0.02/GB = ~$36/month + data charges per attachment
Load balancers: ALB/NLB at ~$20-30/month + LCU charges
Impact: Network infrastructure can cost $5-15K/month before any compute

4. Managed Service Lock-in Premium

RDS costs 2-3x self-hosted PostgreSQL on EC2
Aurora costs 4-5x self-hosted PostgreSQL on bare metal
Elasticsearch Service costs 3-4x self-hosted on EC2
Impact: $500/month self-hosted DB becomes $2,000/month RDS, $2,500/month Aurora

5. Idle Resource Waste

FinOps Foundation: 15-30% of cloud spend is waste
Orphaned EBS volumes, unused load balancers, forgotten test environments
Dev/staging environments running 24/7 (used 40 hours/week)
Impact: $2M cloud bill → $300-600K pure waste

6. Reserved Instance and Savings Plan Complexity

Require accurate 1-3 year capacity forecasting
Wrong guess = wasted prepayment or continued on-demand premium
Management overhead: which instances to reserve, when to renew
Impact: 10-20 hours/quarter reserved instance optimization

7. Multi-Account and Organizational Complexity

Dozens or hundreds of AWS accounts for isolation
Centralized billing, IAM complexity, cross-account access
Security and compliance overhead
Impact: 1-2 FTE dedicated to cloud account management at enterprise scale

Bare Metal's Hidden Costs

Bare metal advocates downplay the real operational burden of self-managed infrastructure.

1. Upfront Capital Expenditure

Hardware purchase: $10-30K per server (enterprise-grade)
Datacenter setup or colocation contracts
Network equipment, switches, firewalls
Impact: $700K (37signals), $53M+ (Dropbox) upfront investment

2. Infrastructure Team Overhead

10-20 hours/week minimum for infrastructure management
At $60-75/hour senior DevOps rates: $2,500-$6,000/month
Scales with infrastructure complexity
Impact: $30-72K annually in hidden labor cost

3. Capacity Planning Risk

Over-provision → wasted hardware investment
Under-provision → emergency hardware procurement (weeks of lead time)
Hardware refresh cycles (3-5 years)
Impact: 20-30% over-provisioning typical to avoid capacity emergencies

4. Hardware Failure and Redundancy

Servers fail: plan for 2-5% annual failure rate
Need N+1 or N+2 redundancy for HA
Spare parts inventory and RMA processes
Impact: 15-25% additional hardware for redundancy

5. Datacenter and Power Costs

Colocation: $1,000-$5,000/month per rack
Power: $0.10-$0.30/kWh (80-150W per server × 24/7)
Cooling: adds 30-50% to power costs
Impact: 200 servers = $4-8K/month power + $10-30K/month colocation

6. Network Bandwidth and Transit Costs

Datacenter bandwidth not always "free"
Transit providers charge for high bandwidth (95th percentile billing)
DDoS protection and network security
Impact: 10Gbps commit = $2-5K/month bandwidth costs at colocation

7. Compliance and Security Overhead

Physical security: datacenter access controls
Compliance audits: SOC2, ISO27001 for self-managed infrastructure
Security patching: OS, firmware, hardware vulnerabilities
Impact: SOC2 audit: $50-150K annually, ongoing compliance overhead

8. Opportunity Cost and Team Distraction

Engineering time spent on infrastructure ≠ time spent on product
6-12 month migration timeline with team focus shift
Delayed feature development during migration
Impact: Hard to quantify but potentially millions in delayed revenue

💡 Key Takeaway

Both cloud and bare metal have hidden costs. Cloud hides costs in egress fees, cross-AZ traffic, managed service premiums, and waste (15-30% of spend). Bare metal hides costs in upfront CAPEX, infrastructure team overhead ($30-72K/year), capacity planning risk, and opportunity cost of team distraction. Calculate total cost of ownership, not just sticker price.

The FinOps Response: Optimizing Cloud Before Exiting

Before committing to cloud repatriation, platform engineering teams should exhaust cloud cost optimization strategies. Many organizations discover 30-50% savings are possible without leaving AWS.

The FinOps Maturity Model

Crawl Phase (0-6 months):

Establish cost visibility: tag all resources, enable Cost Explorer
Identify waste: unused resources, orphaned volumes, idle instances
Quick wins: rightsize obvious over-provisioned instances
Typical savings: 15-20% of cloud spend

Walk Phase (6-18 months):

Automate waste cleanup: scheduled shutdown of dev/staging environments
Reserved Instance / Savings Plan strategy for predictable workloads
Implement FinOps policies: budgets, alerts, approval workflows
Typical savings: 25-35% of cloud spend

Run Phase (18+ months):

FinOps as Code: policy-driven cost optimization in CI/CD
Unit economics: cost per customer, per transaction, per feature
Culture shift: engineers own cost as quality metric
Typical savings: 35-50% of cloud spend

Top 10 Cloud Cost Optimization Strategies

1. Eliminate Waste (Quickest ROI)

Action: Identify and terminate idle resources (unused instances, orphaned volumes, forgotten load balancers)
Typical Savings: 10-15% of total cloud spend
Tools: AWS Trusted Advisor, CloudHealth, Spot.io
Example: $2M cloud bill → $200-300K saved by deleting waste

2. Rightsize Over-Provisioned Instances

Action: Match instance types to actual CPU/memory utilization
Typical Savings: 20-30% on compute spend
Tools: AWS Compute Optimizer, CloudWatch metrics analysis
Example: m5.4xlarge (16 vCPU) at 30% utilization → m5.2xlarge (8 vCPU) saves 50%

3. Reserved Instances and Savings Plans

Action: Commit 1-3 years for predictable baseline workloads
Typical Savings: 30-70% vs on-demand pricing
Risk: Wrong forecast = wasted prepayment or continued on-demand premium
Best Practice: Reserve 60-70% of baseline, keep 30-40% flexible on-demand

4. Spot Instances for Fault-Tolerant Workloads

Action: Use spot instances (up to 90% off) for batch jobs, CI/CD, dev/test
Typical Savings: 60-90% on applicable workloads
Limitation: Can be terminated with 2-minute notice
Use Cases: Data processing, ML training, rendering, test environments

5. Auto-Scaling and Scheduled Shutdown

Action: Scale down non-production environments outside business hours
Typical Savings: 50-70% on dev/staging costs (running 40hrs/week vs 168hrs/week)
Tools: AWS Instance Scheduler, custom Lambda functions
Example: 50 dev/test instances × $100/mo = $5K → $1.75K (65% savings)

6. Storage Lifecycle Policies

Action: Auto-transition infrequent-access data to cheaper storage tiers
Typical Savings: 40-90% on storage costs
Strategy: S3 Standard → S3-IA (30 days) → Glacier (90 days) → Deep Archive (365 days)
Example: 100TB S3 Standard ($2,300/mo) → 80TB Glacier ($320/mo) = $1,980/mo saved

7. Egress Cost Reduction

Action: Use CloudFront CDN (free egress to CloudFront), cache aggressively, compress data
Typical Savings: 30-50% on data transfer costs
Strategy: CloudFront ↔ S3 egress free, CloudFront → internet cheaper than S3 → internet
Example: 10TB/month S3 egress ($900) → CloudFront ($600) + caching (3TB actual) = $180

8. Commitment to Architecture Optimization

Action: Refactor chatty microservices, reduce cross-AZ traffic, optimize database queries
Typical Savings: 20-40% on network and data transfer
Investment: Requires engineering time, not just configuration
Example: Collocate services in single AZ (trade redundancy for cost where acceptable)

9. Managed Service Alternatives

Action: Replace expensive managed services with self-hosted equivalents
Typical Savings: 50-70% on database and service costs
Trade-off: Operational burden increases
Example: RDS PostgreSQL ($500/mo) → PostgreSQL on EC2 ($150/mo instance + management overhead)

10. FinOps Culture and Accountability

Action: Make cost visibility real-time for engineers, assign budgets per team/product
Typical Savings: 10-20% through behavior change
Tools: CloudZero, Vantage, Kubecost (for Kubernetes)
Strategy: Unit economics (cost per customer), cost as quality metric, showback/chargeback

💡 Key Takeaway

Before repatriating to bare metal, exhaust cloud optimization strategies. Most organizations achieve 30-50% cloud cost reduction through waste elimination, rightsizing, reserved instances, auto-scaling, and storage lifecycle policies—without the operational complexity of leaving cloud.

When Optimization Isn't Enough

You've optimized cloud and still need repatriation if:

Post-optimization costs still 3-5x bare metal equivalent
- After removing waste, rightsizing, and commitments, still paying massive premium
- Example: $1.2M optimized AWS bill vs $400K bare metal equivalent
Egress costs dominate and can't be reduced
- High bandwidth workloads (video, large file transfers) with unavoidable egress
- Example: $300K/year egress fees for data analytics platform (Hetzner = $0)
Managed services provide minimal value
- Using AWS primarily for compute and block storage
- Self-hosting PostgreSQL, Redis, Kafka already (no managed service value)
- Kubernetes portable across infrastructure
Predictable workload eliminates elasticity value
- 12+ months data shows <2x traffic variation
- Capacity planning is feasible and accurate
- Don't need cloud's burst scaling capabilities
Team has infrastructure expertise and capacity
- Dedicated platform team (3+ FTE) with datacenter experience
- Bandwidth to manage 10-20 hours/week infrastructure overhead
- Culture of infrastructure ownership

Decision Point: If you've optimized cloud spend by 30-50% and still meet criteria above, repatriation economics likely favor bare metal. If optimization closed the cost gap significantly or you rely on managed services, stay in cloud.

Platform Engineering Team Recommendations

For Startups (<100 Engineers, Pre-PMF)

Recommendation: Stay in cloud

Rationale:

Unpredictable scaling needs
Team should focus on product, not infrastructure
Runway concerns make OpEx flexibility critical
Cloud markup is "insurance premium" for flexibility

Cost Optimization Focus:

Aggressive waste cleanup (unused resources)
Schedule dev/staging environment shutdown (nights/weekends)
Use spot instances for CI/CD, batch jobs
Don't over-engineer: default to smallest instances that work

When to Revisit:

Reach 50-100 sustained servers with predictable patterns
12+ months of stable workload data
Cloud bill exceeds $500K annually
Post-PMF with clear growth trajectory

For Growth Companies (100-500 Engineers, Series B-C)

Recommendation: Evaluate hybrid architecture

Rationale:

Sufficient scale to justify infrastructure investment
Likely have predictable baseline workload
Platform team exists or can be built
Cost savings materially impact burn rate

Evaluation Checklist:

Cloud bill >$500K annually with 70%+ predictable workload
Minimal managed service dependency (or can self-host equivalents)
3+ engineers with infrastructure expertise
Executive support for 6-12 month migration
Can commit $100K-$500K upfront for hardware/migration

Recommended Approach:

Phase 1 (Months 1-3): Optimize cloud spend (target 30% reduction)
Phase 2 (Months 4-6): Migrate dev/test environments to Hetzner/DigitalOcean (low risk)
Phase 3 (Months 7-12): Migrate baseline production workloads to bare metal/colocation
Phase 4 (Ongoing): Hybrid architecture—bare metal baseline, cloud burst capacity

Expected Savings: 30-40% total infrastructure cost

For Enterprises (500+ Engineers, Series D+)

Recommendation: Strategic hybrid with selective repatriation

Rationale:

Massive scale justifies infrastructure investment
Likely already have infrastructure specialists
Cost optimization is board-level priority
Risk tolerance for multi-datacenter operations

Strategic Framework:

Workload Classification:

Class 1: Repatriate to Bare Metal
- Predictable baseline compute (web servers, API servers)
- Self-hosted databases (PostgreSQL, MySQL, Redis, Kafka)
- Batch processing and analytics
- Target: 40-60% of compute workload
Class 2: Keep in Cloud
- Burst capacity for traffic spikes
- Global multi-region presence
- Managed services providing high value (e.g., SageMaker, Kinesis)
- Target: 20-30% of compute workload
Class 3: Move to Cheap Cloud (DigitalOcean, Hetzner)
- Dev, staging, QA environments
- CI/CD infrastructure
- Internal tooling
- Target: 20-30% of compute workload

Expected Savings: 40-60% total infrastructure cost

Investment Required:

Hardware: $1-5M upfront (depends on scale)
Migration team: 4-8 engineers for 12-18 months
Platform team expansion: +2-4 FTE for ongoing management

Break-Even Timeline: 12-24 months

The "Never Repatriate" Scenarios

Don't repatriate if:

Heavy AWS Managed Service Lock-In
- Core business logic in Lambda, Step Functions, proprietary services
- Migration requires 6-12+ month app rewrites
- Alternative: Optimize managed service usage, negotiate enterprise discounts
True Burst Scaling Requirements
- E-commerce with Black Friday-like spikes (10-100x traffic)
- News/media sites with viral traffic unpredictability
- Alternative: Hybrid with bare metal baseline + cloud burst capacity
Global Multi-Region Compliance
- Must operate in 10+ geographic regions
- Compliance certifications require cloud infrastructure
- Alternative: Negotiate volume discounts, optimize within cloud
Small Team Without Infrastructure Expertise
- No dedicated platform team (<3 FTE)
- No infrastructure specialists on staff
- Alternative: Aggressive cloud cost optimization, consider managed Kubernetes
Fast-Growing, Unpredictable Scaling
- 2-5x YoY growth with unpredictable patterns
- Risk of under-provisioning bare metal capacity
- Alternative: Stay in cloud until growth stabilizes, then revisit

The 2025-2027 Outlook: Regulatory and Market Forces

Regulatory Pressure on Egress Fees

EU Data Act (Effective September 2025):

Targets "unfair contractual terms" in cloud contracts
Aims to reduce switching barriers between cloud providers
Bans profit-generating egress fees by January 12, 2027
Impact: AWS, Azure, GCP must eliminate or reduce egress charges in EU

AWS Response (2025):

Increased free egress tier from 1GB to 100GB/month
Waives egress fees for time-bound migrations (60-day credits with approval)
Interpretation: Regulatory pressure working, but changes incremental so far

What This Means for Platform Teams:

Egress fee reduction likely continues through 2027
EU-based operations may see significant egress savings
Cloud lock-in concerns diminishing (easier to migrate data out)
Strategy: Monitor regulatory developments, plan migrations for post-egress-fee era

Market Dynamics: Cloud Growth Despite Repatriation

The Paradox: 86% of CIOs plan repatriation, yet cloud spending grows 21.5% annually.

Explanation:

Selective Repatriation: 21% of workloads repatriated, but 30%+ new workloads to cloud
Net Cloud Growth: Migration to cloud outpaces repatriation for most enterprises
Hybrid Strategies: Companies optimize workload placement, not binary cloud exit

Gartner Forecast:

2024 cloud spending: $595.7 billion
2025 cloud spending: $723.4 billion (21.5% growth)
Cloud remains dominant despite repatriation trend

What This Tells Us:

Repatriation is workload optimization, not wholesale cloud rejection
Cloud will remain dominant for appropriate use cases
Sophisticated teams optimize placement, don't pick ideological sides

The Rise of "Bare Metal Cloud" and Hybrid Solutions

New Market Entrants:

Hetzner: Traditional bare metal provider, now offering cloud flexibility
Latitude.sh: Bare Metal as a Service (BMaaS) with Terraform/API provisioning
OpenMetal: On-demand private clouds with cloud-like provisioning
Vultr, Linode (Akamai): VPS providers offering bare metal options

What's Changing:

Bare metal now has cloud-like provisioning (Terraform, APIs, automation)
"Physical servers as easily as VMs" (platform engineering integration)
Hybrid architectures become operationally feasible

Platform Engineering Impact:

IaC (Terraform, Pulumi) works across cloud and bare metal
Kubernetes portability enables seamless hybrid deployments
CI/CD pipelines provision bare metal as easily as AWS EC2

The Future: Workload placement becomes continuous optimization problem, not one-time migration decision.

Conclusion: Optimize for Economics, Not Ideology

The cloud repatriation debate is polarizing, but the data is clear:

The Truths:

✅ AWS charges 7-18x markup on compute vs bare metal (verifiable)
✅ Egress fees add 10-100x total cost for data-intensive workloads (real at scale)
✅ 86% of CIOs plan some repatriation (highest on record, up from 43% in 2020)
✅ Real companies save millions (37signals: $2M/year, Dropbox: $75M over 2 years)
✅ Most repatriation is selective, not wholesale (21% of workloads, not 100%)
✅ Cloud spending still grows 21.5% annually despite repatriation trend
✅ Bare metal has hidden costs: CAPEX, team overhead, capacity planning risk
✅ Cloud has hidden costs: egress, cross-AZ traffic, managed service premiums, 15-30% waste

The Framework:

Stay in Cloud If:

Startup pre-PMF with unpredictable growth
True burst scaling requirements (10x+ spikes)
Heavy managed service dependency
Small team (<100 engineers, <3 infrastructure FTE)
Global multi-region compliance requirements

Consider Repatriation If:

Predictable workload (>50 servers sustained 12+ months)
High egress costs (>$5K/month)
Minimal managed service lock-in
Mature engineering org (100+ engineers, 3+ infrastructure FTE)
Post-optimization cloud bill still 3-5x bare metal equivalent

Optimal Strategy for Most: Hybrid architecture

Bare metal for predictable baseline workloads (40-60% cost savings)
Cloud for burst capacity and high-value managed services
Cheap VPS (Hetzner, DigitalOcean) for dev/test environments
Result: 30-50% total infrastructure cost reduction without sacrificing flexibility

📚 Learning Resources

📖 Essential Cost Comparison Data

AWS vs DigitalOcean vs Hetzner: 2025 Cost Comparison - Comprehensive pricing breakdown with performance benchmarks
AWS Egress Costs in 2025: How to Reduce Them - Deep dive on egress pricing and optimization strategies
Cloud vs. Bare Metal: A Comprehensive Cost Analysis - ROI framework for infrastructure decisions

📝 Real-World Case Studies

37signals Cloud Repatriation Journey - $2M annual savings, complete AWS exit
Dropbox Infrastructure Optimization Analysis - $75M saved over 2 years
Cloud Repatriation Statistics 2025 - 86% of CIOs planning repatriation, industry trends

🎥 Platform Engineering Perspectives

Why Platform Engineering Needs Bare Metal in 2025 - BMaaS and hybrid architecture strategies
The Cost of Cloud: A Trillion Dollar Paradox - Andreessen Horowitz analysis on cloud economics

📚 FinOps and Cost Optimization

FinOps Best Practices for 2025 - 8 strategies for cloud cost management
Unlocking Cloud Cost Optimization: A Guide to Cloud FinOps - Google Cloud's FinOps framework
Top 15 Cloud Cost Optimization Strategies - Comprehensive tactics and tools
FinOps as Code: Managing Cloud Costs - McKinsey on automated cost optimization

🛠️ Cost Calculation and Decision Tools

AWS Pricing Calculator - Official AWS cost estimation
Hetzner Price Calculator - Compare VPS and bare metal pricing
Cloud Migration Cost Calculator - Economic framework for migration decisions

🌐 Community and Discussion

Hacker News: Cloud Repatriation Discussions - 37signals migration discussion with practitioner insights
Cloud Repatriation: The New Stack Analysis - Industry perspectives on the trend
FinOps Foundation Community - Cloud financial management best practices and community

📡 Stay Updated

Cloud Provider Pricing: AWS Pricing • GCP Pricing • Azure Pricing

Alternative Providers: Hetzner • DigitalOcean • Vultr • Linode

Industry Analysis: FinOps Foundation Blog • The Register Cloud Coverage • The New Stack Infrastructure

Cost Optimization Tools: nOps • CloudZero • Vantage • Ternary

Quick Answer (TL;DR)​

🎥 Watch on YouTube​

The Repatriation Wave Reaches Critical Mass​

Key Statistics (2024-2025 Data)​

The 10-100x Markup Reality​

The Bold Claim​

Real Cost Comparison: AWS vs Hetzner (Bare Metal)​

VPS Middle Ground: AWS vs DigitalOcean vs Hetzner​

The Egress Fee Multiplier​

Storage: The S3 Lock-In Tax​

The 100x Cases: Where It Actually Happens​

The Counterargument: What You're Actually Paying For​

The Case Studies: Who Left and What They Saved​

37signals: The Poster Child of Cloud Exit​

Dropbox: The $75 Million Migration​

GEICO: The Repatriation Regret Story​

The 86% Who Aren't Making Headlines​

The Decision Framework: When Cloud Makes Sense vs When It Doesn't​

The Core Question​

Decision Matrix: Cloud vs Bare Metal​

The "Stay in Cloud" Profile​

The "Consider Repatriation" Profile​

The Hybrid Approach: Best of Both Worlds​

The Hidden Costs Nobody Talks About​

Cloud's Hidden Costs​

Bare Metal's Hidden Costs​

The FinOps Response: Optimizing Cloud Before Exiting​

The FinOps Maturity Model​

Top 10 Cloud Cost Optimization Strategies​

When Optimization Isn't Enough​

Platform Engineering Team Recommendations​

For Startups (<100 Engineers, Pre-PMF)​

For Growth Companies (100-500 Engineers, Series B-C)​

For Enterprises (500+ Engineers, Series D+)​

The "Never Repatriate" Scenarios​

The 2025-2027 Outlook: Regulatory and Market Forces​

Regulatory Pressure on Egress Fees​

Market Dynamics: Cloud Growth Despite Repatriation​

The Rise of "Bare Metal Cloud" and Hybrid Solutions​

Conclusion: Optimize for Economics, Not Ideology​

📚 Learning Resources​

📖 Essential Cost Comparison Data​

📝 Real-World Case Studies​

🎥 Platform Engineering Perspectives​

📚 FinOps and Cost Optimization​

🛠️ Cost Calculation and Decision Tools​

🌐 Community and Discussion​

📡 Stay Updated​

Quick Answer (TL;DR)

🎥 Watch on YouTube

The Repatriation Wave Reaches Critical Mass

Key Statistics (2024-2025 Data)

The 10-100x Markup Reality

The Bold Claim

Real Cost Comparison: AWS vs Hetzner (Bare Metal)

VPS Middle Ground: AWS vs DigitalOcean vs Hetzner

The Egress Fee Multiplier

Storage: The S3 Lock-In Tax

The 100x Cases: Where It Actually Happens

The Counterargument: What You're Actually Paying For

The Case Studies: Who Left and What They Saved

37signals: The Poster Child of Cloud Exit

Dropbox: The $75 Million Migration

GEICO: The Repatriation Regret Story

The 86% Who Aren't Making Headlines

The Decision Framework: When Cloud Makes Sense vs When It Doesn't

The Core Question

Decision Matrix: Cloud vs Bare Metal

The "Stay in Cloud" Profile

The "Consider Repatriation" Profile

The Hybrid Approach: Best of Both Worlds

The Hidden Costs Nobody Talks About

Cloud's Hidden Costs

Bare Metal's Hidden Costs

The FinOps Response: Optimizing Cloud Before Exiting

The FinOps Maturity Model

Top 10 Cloud Cost Optimization Strategies

When Optimization Isn't Enough

Platform Engineering Team Recommendations

For Startups (<100 Engineers, Pre-PMF)

For Growth Companies (100-500 Engineers, Series B-C)

For Enterprises (500+ Engineers, Series D+)

The "Never Repatriate" Scenarios

The 2025-2027 Outlook: Regulatory and Market Forces

Regulatory Pressure on Egress Fees

Market Dynamics: Cloud Growth Despite Repatriation

The Rise of "Bare Metal Cloud" and Hybrid Solutions

Conclusion: Optimize for Economics, Not Ideology

📚 Learning Resources

📖 Essential Cost Comparison Data

📝 Real-World Case Studies

🎥 Platform Engineering Perspectives

📚 FinOps and Cost Optimization

🛠️ Cost Calculation and Decision Tools

🌐 Community and Discussion

📡 Stay Updated