[Kubernetes] – P.5 – Scale

In case the Deployment creates only one Pod for running our application, when traffic increases, we will need to scale the application to keep up with user demand.

Scaling is accomplished by changing the number of replicas in a Deployment

Scale Overview

Scaling out a Deployment will ensure new Pods are created and scheduled to Nodes with available resources. Scaling will increase the number of Pods to the new desired state. Scaling to zero is also possible, and it will terminate all Pods of the specified Deployment.

Running multiple instances of an application will require a way to distribute the traffic to all of them. Services have an integrated load-balancer that will distribute network traffic to all Pods of an exposed Deployment. Services will monitor continuously the running Pods using endpoints, to ensure the traffic is sent only to available Pods.

Scaling is accomplished by changing the number of replicas in a Deployment.

Once you have multiple instances of an Application running, you would be able to do Rolling updates without downtime (zero downtime). We’ll talk about that in the next module.

After scaling, we have something likes


Step 1: Scaling a deployment

To list your deployments use the command:

kubectl get deployments

To view the details of your deployments:

kubectl describe deployments

Find the attribute name “Replicas”, you will see something likes

“Replicas:  1 desired | 1 updated | 1 total | 1 available | 0 unavailable”

The DESIRED state is showing the configured number of replicas

The CURRENT state show how many replicas are running now

The UP-TO-DATE is the number of replicas that were updated to match the desired (configured) state

The AVAILABLE state shows how many replicas are actually AVAILABLE to the users

To scale the Deployment to 4 replicas, use this command

kubectl scale deployments/kubernetes-bootcamp --replicas=4

Run this command to re-check

kubectl get deployments

There are 4 Pods now, with different IP addresses. The change was registered in the Deployment events log. To check that, use the describe command:

kubectl describe deployments/kubernetes-bootcamp

You will see something likes

“Replicas: 4 desired | 4 updated | 4 total | 4 available | 0 unavailable”

Step 2: Load Balancing

Use this command to find out the exposed IP and port, then find the IP and NodePort attributes

kubectl describe services/kubernetes-bootcamp

Or use this command below to get and set port to env variable

export NODE_PORT=$(kubectl get services/kubernetes-bootcamp -o go-template='{{(index .spec.ports 0).nodePort}}')

Next, we’ll do a curl to the exposed IP and port. Execute the command multiple times:

curl $(minikube ip):$NODE_PORT

Step 3: Scale Down

To scale down the Service to 2 replicas, run again the scale command:

kubectl scale deployments/kubernetes-bootcamp --replicas=2

List the Deployments to check if the change was applied

kubectl get deployments

The number of replicas decreased to 2. List the number of Pods, by using

kubectl get pods -o wide



Leave a Reply

Your email address will not be published. Required fields are marked *