A practical introduction to Kubernetes Persistent Volumes and Init Containers
When you start learning Kubernetes you come across fundamental concepts, such as Pods, Deployments, Services, etc. These objects enable you to deploy stateless applications and microservices. However, as soon as you delve deeper into Kubernetes, you’ll come across things such as Persistent Volumes and multi-container Pod design patterns. In this post I’ll guide you through deploying your first Persistent Volume and Init Container.
You can find the code for this walkthrough in this repository.
Prerequisites
For this walkthrough, make sure to have the following:
- Basic understanding of Kubernetes.
- You’ll need a Kubernetes cluster. Running it locally with Docker Desktop is fine.
- You must have the Express.js application containerized and available in a container registry. You can simply use the Dockerfile from the aforementioned repo to create this Docker image. In this walkthrough we’ll make use of AWS ECR to store the Docker image. This means that you’ll need an AWS account. Alternatively, you can also use a container registry elsewhere, but in that case you’ll need to update the Kubernetes manifest files accordingly.
So what are persistent volumes and multi-container pods?
A Persistent Volume (PV) enables you to keep state outside of your Pods, which means that your applications won’t lose valuable data when a Pod fails or even your entire cluster.
The extent of this fault tolerance depends on the PV type. For example, when you the hostPath type, the PV uses a directory or file on the Node to emulate network-attached storage. This means that your storage would persist if a Pod fails. However, if the cluster fails, you’ll lose the stored data. On the other hand, you also have cloud provider-backed solutions, such as AWS EBS or Azure Disk storage. With such solutions, your data would persist even if the cluster crashes, because the storage exists outside of the Kubernetes cluster.
To make use of a PV in a Pod (and thus your application), you would have to create a Persistent Volume Claim (PVC). After creating the PV and PVC, Kubernetes looks for a PV that meets the requirements and StorageClass name defined in the PVC. If it finds an appropriate PV, it will bind the claim to the Volume.
The idea of multi-container Pods is another useful tool to put in your toolbox. When you use multiple containers in a single Pod, you can run one or more specialized containers next to your application.
This would, for example, enable you to ensure that required services are up and running before your application starts. As a result, you would be able to guarantee that all external dependencies are in place before your application will process any incoming requests. This functionality can be implemented with Init Containers.
The solution: Discover Weekly
We’re going to build a simple solution which exists out of two services:
- discover-weekly. This is the main application of this solution which will query a database for personal song suggestions. To make sure that the database is up and running before the application starts, we’ll implement the init container pattern.
- mysql-service. This is the MySQL database service of which the data is stored in a Persistent Volume. For simplicity, we’ll make use of a hostPath PV type.
In the diagram below, you can see the architecture of the two services in a single namespace which is called discover-weekly.
A few key points:
- For the discover-weekly service, we’ll use a NodePort Service. This will give us the ability to reach the application from our local environment. If you want to use this solution in a cloud environment, feel free to use the LoadBalancer variant instead.
- As mentioned earlier, we’ll make use of Init Containers to ensure that the MySQL service is up and running. We’ll also make use of liveness probes. This will help us to identify Pods of which the application entered a broken state. As a result, Kubernetes will be able to understand that the application is not healthy anymore and that it should replace the Pod.
Generic Kubernetes objects
Before we can create any service, we must set up some generic resources. Specifically, we need the discover-weekly Namespace and database-credentials Secret. The Secret will contain the base64 encoded username and password for the root and admin accounts. Base64 encoded strings can be generated with the command:
echo -n "stringtoencode" | base64
As you can see in the example, the secret values used in this walkthrough are simple values for demonstration purposes. However, for your own secrets make sure to use strong secret values.
Deploy the generic resources with the command:
kubectl apply -f manifests/generic/
To deploy the discover-weekly application on your cluster, you have to add another Secret to your cluster. This secret will contain the information required to access ECR. To create this Secret you can use the following command. Make sure to replace the values account_id
and region
with the values that are relevant for you.
kubectl create secret docker-registry registrycredentials \
--docker-server=account_id.dkr.ecr.region.amazonaws.com \
--docker-username=AWS \
--docker-password=$(aws ecr get-login-password) \
-n discover-weekly
The MySQL service
Now that we have the generic resources available in the cluster, we can start with setting up the MySQL service. For this we’ll create four objects: a PV, a PVC, a Deployment, and a Service.
The code snippet above shows you how you can create the PV and PVC. For both objects you can see that they have the same storageClassName, which is used to bind the two objects to each other. They also share the same access mode, which is set to ReadWriteOnce. This means that only one Node can mount the volume as read-write.
In addition, Kubernetes also checks if the requested storage capacity (which is defined by the PVC) can be supported by one of the available PVs. If a PV can be found with the right StorageClass name, access mode, and size, the PVC will be attached to the PV. As specified in the manifest, the PV stores its data at the/mnt/data
path on the cluster’s Node.
At this point we’ve created the PV and PVC, so it’s time to actually make use of the storage. As you can see in the code below, we’ve got a StatefulSet which makes use of the mysql:5.6 container image. We specify the root password, MySQL user, and MySQL password as environment variables, such that we can log in using these credentials. Starting on line 45, we’re stating that we want to make use of the PVC that we’ve created and we call the volume “mysql-persistent-storage”. Finally, we refer to this volume in the volumeMounts section by referring to the name of the volume that we just declared and we mount it to the path /var/lib/mysql
, which is located in the container. This means that the MySQL Pod can use the /var/lib/mysql
path to read and write data to and from the persistent volume.
We can deploy the mysql-service objects with the command:
kubectl apply -f manifests/database/
The Discover Weekly service
At this point, we have the generic resources and the mysql-service available on our cluster. The last step is to create an application that will use the database service. In this walkthrough we’ll make use of a Express.js API, which will (1) insert test data into the database when the server starts up and (2) return the test data from the database when you send a GET request to http://localhost:30000.
Some key points:
- To grant the application access to the database, we’ll pass the credentials as environment variables.
- To ensure the application is healthy, we’ll create a simple liveness probe. This probe will send a HTTP Get request to the
/health
endpoint every 15 seconds. - We’ll also add an Init Container, which will ensure that the mysql-service is up and running before the Express.js API will be started.
- We’ll create a NodePort Service, which allows us to access the discover-weekly Service without requiring us to setup an Ingress and/or LoadBalancer solution.
Before applying the manifests for the application, make sure to update the deployment.yml file, such that the image location is set properly, Currently it’s configured as:
account_id.dkr.ecr.region.amazonaws.com/discover-weekly:latest
We can deploy the mysql-service objects with the command:
kubectl apply -f manifests/application/
Alright, at this point we should have the discover-weekly solution up and running!
Testing the solution
To test the solution, you can simple query the following URL in your browser or Postman:
The response would look like:
This response indicates that we have successfully created 1 record in the MySQL database, which happens when the discover-weekly application starts up.
Are we really capable of persisting data when Pods crash?
Let’s test if the data actually persists when a Pod fails by removing both discover-weekly and mysql-service.
kubectl delete -f manifests/application
kubectl delete -f manifests/database
Let’s verify that the resources are entirely removed:
kubectl get all -n discover-weekly
Response: No resources found in discover-weekly namespace.
kubectl get pv
Response: No resources found
At this point we’ve confirmed that the PV and PVC Objects have been removed. Hence, let’s create the database service again:
kubectl apply -f manifests/database/
Now, if you run kubectl get pv
again you should see that we have a PV Object. Let’s take the next step which is recreating the discover-weekly service:
kubectl apply -f manifests/application/
If we call http://localhost:30000
again, we do not only see that the data is still there, but, in fact, we have now two records. This is because:
- We successfully created a PV, which means that the data that we initially created persisted although we simulated a Pod failure.
- The JS application has intentionally been written such that it creates the MySQL table without declaring a uniqueness constraint. This means that we can add duplicate records to the database. Usually, this would undesirable, but in this case it’s actually useful, because we’re not only able to proof that we have a persistent volume, but now we also proved that we can keep writing (static data) to this volume even after a Pod failure.
And do the Init Containers actually work?
We can simply try this out by scaling the discover-weekly application up to e.g. 10 replicas. Let’s do that with the command:
kubectl scale --replicas=10 -f manifests/application/deployment.yml && clear && kubectl get all -n discover-weekly
The command should print something like:
Here you can see that the first Pod has the status Init:0/1
. This means that 0 out of 1 Init Container completed successfully. In other words, it’s still running. For most of the other Pods, you can see that the Init Container already finished and that the Pod is initializing. If you would run the get all
command again you will most likely see that all 10 Pods are running. Consequently, we can state that the Init Containers are in fact running.
So we know that the Init Containers are running and passing if the MySQL service is there, but what happens if the MySQL doesn’t exist yet? To test this, we have to remove the discover-weekly and mysql-service again. Then, let’s recreate the discover-weekly service. When you get all
resources, you’ll see the status of the container is Init:0/1.
However, this time, if you wait 5 minutes, you’ll see that it’s still in the same state. In other words, the Init Container is not able to succeed, because the dependency isn’t in place. As a result, we also proved that the Init Container do really check if the Service exists.
Moreover, if you create the MySQL service again, you’ll see in a short time that the Pod is in the Running
state again. This proves that the Init Containers keep running until the MySQL Service is up and running.
As you can see in the status of the first Pod we only had 1 Init Container. However, you can actually have multiple of them, so in general the status would look like Init:N/M
. Having multiple Init Containers is useful if you have multiple dependencies which must be in place, such as multiple data sources.
Conclusion
In this walkthrough we have deployed the discover-weekly solution and tested if (1) we can actually persist data in a PV when a Pod fails and (2) whether the Init Containers actually work.
If you want to experiment further with these concepts, you could, for example:
- Create a PV where the data is stored in AWS EBS or Azure Disk Storage. Having this PV, you can do an experiment where you simulate a cluster failure by removing your entire cluster and test if the stored data is actually being persisted.
- Try out having multiple Init Containers in a single Pod.
- Explore other multi-container Pod patterns, such as the Sidecar, Adapter, or Ambassador pattern.