Reducing costs by leveraging GKE Optimizations and GCP Preemptible VMs

  • August 25, 2020

In this article we will show an example of how the thought process works when we tackle the resource optimization problem.

First, let’s address what is Kubernetes: Kubernetes is the world’s most popular container orchestration tool; it’s an open-source system for automating deployment, scaling, and management of containerized applications.

Nowadays, all major public-cloud providers offer a managed version of Kubernetes, such as Google Kubernetes Engine (GKE) or Amazon’s Elastic Kubernetes Service (EKS), to name just a few. GKE is very flexible and a great platform where to deploy (almost) any type of application, but where it really shines is with containerized cloud-native and/or micro-service centric applications.

What we did?

At Flugel.it , we have been using Preemptible VMs within several clients. Today, we’ll talk about the experience with one of them, in the Ecommerce SaaS space. Before the optimization process, during a busy week, our nodes could easily exceed 100 vCPU occupation and 300 GB of total memory, but the application was not able to completely use the resources available to it in every instance of the Kubernetes cluster. This caused the cluster to over-provision more than what is optimal, which became an unnecessarycost overhead.

What we noticed was that on each node (VM Instance), after deploying all the Pods that fitted, there were some leftover resources, so we adjusted the memory size and after deploying all Pods, there was no leftover memory. Even with this dramatically reduced infrastructure expenditure, our goal was to reduce it evenfurther.

Once the size optimization path was exhausted, we started to look to more innovative approaches. One potential solution we found was to implement Preemptible VMs since, at least in theory, it was possible to use Preemptive VMs to compose the Kubernetes clusters (GKE), either completely or a good part of them. But, before we continue, let’s address what Preemptible VMs are and their limitations:

What is a Preemptible VM?

A Preemptible VM is a type of instance that can be created and run at a much lower cost, up to 70% less cost than on-demand nodes.

Do Preemptible VMs have cons or limitations?

For most purposes, Preemptible VMs function like normal instances, but yes, they have cons and limitations:

● Preemptible VMs can be terminated (preempted) by the Compute Engine if it requires use of those resources for other (on-demand) instances.
● Preemptible VMs might not always be available because they are to “fill” exceeding Compute Engine capacity, so their availability changes with its usage.
● Compute Engine might terminate Preemptible VMs , at any time, due to system events. The probability that Compute Engine will terminate a Preemptible VMs for a system event is generally low, but it may change from day to day and from zone to zone depending on current conditions.
● Compute Engine always terminates Preemptible VMs after 24 hours from its creation, which said some actions can be taken to reset this 24-hour counter.
● Preemptible VMs are finite Compute Engine resources, so they might not always be available.
● Preemptible VMs can’t be migrated to a regular VM instance.
● Preemptible VMs can’t be set to automatically restart when there is a maintenance event.
● Google Cloud Free Tier credits for Compute Engine do not apply to Preemptible VMs.

Due to the above limitations, Preemptible VMs are not covered by any Service Level Agreement (and they are excluded from the Compute Engine SLA).

The final solution

We used Preemptible VMs in other production environments during the past so we knew it might be a viable option. We only needed to confirm that the application could withstand the sudden elimination of a complete cluster node.

During the course of few days, we did exhaustive tests to finally determine that the application was good to go with Preemptible VMs ; therefore, we progressively implemented the new solution in more and more critical environments getting more confident that, in this particular case, there was no undesirable behaviour from the application.

Conclusion

Even though we are not in the freedom of showing the actual numbers, we can confidently say that the investment cost to investigate, validate and implement both the adjustment of the VM size and latter Preemptive VMs paid-off, and will continue to do so massively to our client. Before this optimization, we have not been using around 15% of the memory deployed in each VM instance and by leveraging Preemptive VMs. We reduced the average cost of each Kubernetes cluster by over 60%.

At Flugel.it, we strive to lower bill costs for all our clients, sometimes by means of leveraging tools and resources like GKE Preemptible VMs or AWS Spot Instances, but most times by iterating over current infrastructure, sometimes by replacing old solutions with newer more refined versions. In spite of the fact that not all applications are eligible for implementing Preemptible VMs , this is a great tool to reduce costs when applicable.

It is important to point out that GCP Preemptible Instances are in a production-ready state and we have been successfully using them in live production environments. A second article will be published with a more in depth technical description on how we achieved this solution. If you find this or other of our blog articles interesting, please subscribe to our Blog and Newsletter so you’ll get notified when we release a new piece.

Thanks for reading, and have a great day!

Credits
Written by : Gabriel Vasquez
English language corrections: Jesica Greco and Rebekah Wildman
General corrections and edition: Luis Vinay