When, why and how to run databases in Kubernetes

When, why and how to run databases in Kubernetes

We are excited to bring Transform 2022 back in-person July 19 and virtually July 20 - 28. Join AI and data leaders for insightful talks and exciting networking opportunities. Register today!



“Should I run my database in Kubernetes?” It’s a simple question without a simple answer because the honest answer is: “it depends.” And while there may be extensive benefits, there are also trade-offs. But any decision hinges on what is right for your use case.

Kubernetes is the platform of choice for managing containerized workloads and services. Most executives and developers now agree that the benefits far outweigh the challenges. And even the largest enterprises are using the platform to run stateless and stateful applications on-premises or as hybrid cloud deployments in production.

But matters become more complicated when we think about data and the Kubernetes ecosystem. Stateful applications demand a new database architecture that takes into account the scale, latency, availability and security needs of applications. How do you know which database architecture is best equipped to handle these challenges?

In this article, we’ll discuss the benefits and potential trade-offs of running a database in Kubernetes and explore how many of these trade-offs can be mitigated. Let’s start with the benefits:

Better resource utilization

The mass adoption of microservices architecture leads to a lot of relatively small databases with a finite number of nodes. This creates significant management challenges, and companies often struggle to optimally allocate their databases. But running Kubernetes provides an infrastructure-as-code approach to these challenges. This makes it easy to handle multiple microservices deployments at scale, while optimizing resource utilization on the available nodes.

This is really one of the best arguments for Kubernetes. It can be utilized when running several databases in a multitenant environment. It enables companies to not only save on costs, but also reduce the number of nodes required.

Dynamic, elastic scaling of pod resources 

Kubernetes has the unique ability to modify memory, CPU and disk to scale databases depending on workload demands. The ability to scale up automatically without incurring downtime is invaluable to large organizations that regularly experience demand spikes.

Consistency and portability between clouds, on-premises, and edge

Companies want to build, deploy and manage workloads consistently regardless of location. Furthermore, they want the ability to move workloads from one cloud to another. The trouble is most organizations have at least some legacy code they still run on-premises that they’d really like to spin up to the cloud.

Kubernetes allows organizations to deploy infrastructure as code consistently, regardless of location. So, if the development team can write a bit of code describing the resource requirements, the platform will take care of it. This provides the same level of control in the cloud that one would previously have had on bare metal servers.

Out-of-the-box infrastructure orchestration

In Kubernetes, pods can be started anywhere because of the platform’s ability to move workloads from pod to nodes and vice versa. The platform isn’t worried if one pod goes down or moves to a different node because it has no state. This is a bigger issue for databases when dealing with stateful workloads, as it requires setting up specific policies in Kubernetes. However, a few simple policies (e.g., anti-affinity) allow your system to suffer a hardware failure without bringing multiple copies of the database instance down.

Automated day-2 operations

Periodic backups and software upgrades are critical, but they are also time consuming. Fortunately, Kubernetes automates most day-2 operations. Even better, performing these updates across a cluster is easy. So, for example, if you wanted to patch a security vulnerability across a cluster, Kubernetes makes that easy.

It is important to note, however, that automated day-2 operations can be complicated for a traditional relational database management system (RDBMS). When using a traditional RDBMS with Kubernetes, you typically have multiple copies of data, so when you lose a pod there’s another copy elsewhere. This means that the user is still responsible for migrating data between pods and resynching. 

When migrating data manually, one would check to see that the cluster isn’t under heavy load, wait until the load mitigates, then move the data to another node. However, if you’re migrating data automatically, you need to build in those checks. Additionally, if you take down a primary copy of data under heavy load, your replica may think it has the data when it really doesn’t. 

Important trade-offs and how to mitigate them

For all the advantages of running databases in Kubernetes, there are tradeoffs to keep in mind. For starters, there is an increased possibility of pod crashes. Pods may crash because of process affinity, and if the process that starts a pod goes down, the entire pod could disappear.

There are also often issues related to local storage vs external persistent storage. Locally attached disks provide fast performance, but they can also create complications because, when you move a pod around, the storage doesn’t go with it. Meanwhile, external persistent storage provides a network-attached form of storage with a logical view of drives.

Organizations should also understand the possible complications that come from networking restrictions in Kubernetes clusters. If an application does not need to be on the same cluster as the actual database, then a load balancer may be required. And network complexities, sometimes related to the geographical location of the cluster, can introduce further issues.

Finally, one must keep an eye open for operational “gotchas” since building in-house Kubernetes expertise takes time. To get the most out of database deployments, organizations will need to:

  • Define anti-affinity and what constitutes a pod disruption
  • Understand the concept of sidecars
  • Build in observability with a tool such as Prometheus
  • Create troubleshooting cookbooks
  • Define private image registries and pool secrets

The benefits of running a database in Kubernetes are clear. There are roadblocks and trade-offs, but there are also ways around them.

Karthik Ranganathan is CTO and cofounder of Yugabyte.


DataDecisionMakers

Welcome to the VentureBeat community!

DataDecisionMakers is where experts, including the technical people doing data work, can share data-related insights and innovation.

If you want to read about cutting-edge ideas and up-to-date information, best practices, and the future of data and data tech, join us at DataDecisionMakers.

You might even consider contributing an article of your own!

Read More From DataDecisionMakers