The Hidden Costs of Cloud Migration
Cloud migration can unlock flexibility, but hidden costs are everywhere. From data transfers to vendor lock-in with Terraform, unexpected expenses can stack up. This article explores lessons learned from migrating an on-prem OpenShift cluster to AKS and how to manage costs while staying compliant.
Cloud migration is all the rage—it promises flexibility, scalability, and potentially reducing your operational costs. But there are a few sneaky costs hiding in the shadows, and I learned this first-hand during our move from OpenShift 3.11 (50+ nodes) to AKS (4-5 nodes). Let’s dive into what I discovered, from data transfer gotchas to unexpected surprises along the way.
Data Transfer Costs
Ah, the fun part! Moving data from our in-house infrastructure to AKS was pricier than expected. Not only did we underestimate the volume of data syncs between regions, but we also got hit with the everyday running costs of services chatting up storage. Take Redis—seems simple, right? Not when it’s using Blob Storage over a secure Private Endpoint! It wasn’t until we switched to local disks that we realised it’s way cheaper to slap on a 4GB disk for those disk-happy services (like Redis and Fluentd).
Key takeaway: Understand how chatty your apps are with storage, what type of storage is required (blob storage isn't suitable for everything) and plan accordingly. Secure traffic? Yes, please. But budget for those costs!
Platform Adaptation and Pipeline Updates
Let’s talk about the real heavy-lifting. Moving from OpenShift meant saying goodbye to OpenShift templates and hello to Helm charts tailored for AKS. Not only did we have to get those Helm charts right, but we also had to adjust our CI/CD pipelines to fit into this new world. And while we thought we were prepared, running both OpenShift and AKS side by side during the transition definitely doubled the workload and hit us with some hidden infrastructure costs.
Key takeaway: Helm charts are your best friend, but be ready for a lot of tweaking. Running both platforms together will keep you sane, but your wallet might groan a bit.
Cloud Optimisation and Monitoring
Here’s the deal—autoscaling? Yeah, we passed on AKS's built-in tools because we wanted full control. Instead, we used Prometheus and Grafana to keep a close eye on costs. This let us spot some sneaky ballooning storage costs, especially on Blob Storage. Redis and Fluentd were talking a lot to storage, so we switched to local disks, and guess what? It was way cheaper!
Key takeaway: Use custom monitoring setups for real control over costs. The switch from Blob to local storage saved us loads without a performance hit.
Over-provisioning Resources
Cloud newbies often over-provision, and we weren’t immune. We gave some apps way more CPU and storage than they needed, just to be safe, but ended up paying for capacity we didn’t even use. Over time, we tuned things down using data from Prometheus and Grafana, getting the apps to fit neatly into what they actually needed. Huge savings, same performance.
Key takeaway: Don’t be afraid to downscale once you know how much your apps really need. It’ll save you money without hurting performance.
Compliance and Security
During our POC phase, Bitdefender’s security dashboard was a lifesaver. It showed us where our security gaps were and helped us lock down trusted images from trusted registries. We also made sure traffic between Azure Blob Storage and apps was secure using Private Endpoints, but man, those ingress and egress charges added up fast.
Key takeaway: Early security investments during POC pay off big time. Watch out for those ingress/egress/storage transaction fees ... they will creep up on you.
Vendor Lock-In and Migration Complexities
Terraform? Yeah, we love it. But here’s the catch: it’s heavily tied to Azure’s modules. Want to switch cloud vendors? Get ready to rework your entire playbook. On the plus side, our Ansible playbooks for managing DBaaS and ProxySQL are vendor-agnostic, and our Helm charts are so flexible they even worked on a manually deployed kubeadmin cluster.
Key takeaway: Vendor lock-in is real, especially with Terraform. Keep your Helm charts and Ansible playbooks flexible so you don’t end up stuck.
Unexpected Downtime and Service Disruptions
Even with the best-laid plans, there’s always a hiccup. We had a few surprise disruptions during the migration—mostly caused by subtle differences between OpenShift and AKS configurations (hello, networking issues!). And remember, downtime means more than just annoyed users; it means dollars lost.
Key takeaway: Expect the unexpected. Have backup plans for downtime, and test your hybrid setups thoroughly.
Conclusion
Cloud migration isn’t just about moving things over; it’s about understanding how each change can add to the bill. From hidden data transfer costs to surprise downtime, it pays to be vigilant. Oh, and a shout-out to Microsoft Billing for helping us sort out a microservice issue after an AKS upgrade that made our Blob Storage bills skyrocket—team effort for the win!
Final thought: Cloud adoption is worth it, but only if you stay on top of the little things that can turn into big costs later on.