When administering your Kubernetes cluster, you will likely run into a situation where you need to delete pods from one of your nodes. You may need to debug issues with the node itself, upgrade the node, or simply scale down your cluster.
Deleting pods from a node is not very difficult, however there are specific steps you should take to minimize disruption for your application. Be sure to fully read and understand each step before executing any commands to ensure no mistakes happen that could lead to downtime.
When the node in question does not have any stateful pods running on it, or any pods that are critical to your system, you can follow a very simple procedure to remove all pods from a node, and optionally remove the node from your cluster.
First, confirm the name of the node you want to remove using kubectl get nodes
, and make sure that all of the pods on the node can be safely terminated without any special procedures.
kubectl get nodes
kubectl get pods -o wide | grep <nodename>
Next, use the kubectl
drain
command to evict all user pods from the node. They will be scheduled onto other nodes by their controller (Deployment, ReplicaSet, etc.). This command is blocking and will return when the pods have been removed. You can run multiple drain commands in separate shells to drain multiple nodes, but only one at a time will execute.
kubectl drain <nodename>
You can verify that no user pods are running on the node in question using get pods
again. There may still be pods running on the node that are either system pods, or have specifically tolerated the NoExecute
taint. DaemonSet pods may also still be present on the node. If the drain command fails, it may be because some pods are not managed by a controller. Running drain again with the --force
flag will cause all of the pods to be deleted.
At this point you can either troubleshoot the node, or replace it completely. If replacing the node, use the following command to delete it from the cluster. After this, you can replace it by spinning up a new node. If your nodes are in an Amazon AutoScaling Group, you can simply terminate the EC2 instance backing this node and it will be replaced after a few minutes.
kubectl delete node <nodename>
Many workloads will have at least one pod that cannot just be terminated and replaced. If you have strict requirements about the number of replica or stateful pods running, or do not want dozens of pods simultaneously disappearing at once, a more deliberate approach can be used to move the pods and minimize the effect on your application.
First, confirm the name of the node you want to remove using kubectl get nodes
, and get a list of all of the pods running on that node so you can identify which ones need to be moved over carefully.
kubectl get nodes
kubectl get pods -o wide | grep <nodename>
Mark the node as unschedulable by using the kubectl
cordon
command. This ensures that no new pods will get scheduled to the node while you are preparing it for removal or maintenance.
kubectl cordon <nodename>
Now you can work on moving pods off of the node. For any pods that are controlled by a Deployment or ReplicaSet (not a StatefulSet), you can increase the replica count by the number of pods that are about to be moved off of the node and then manually delete the pods that were running on the node. If you are not concerned with having one less replica at a time, you can skip the scaling step and just delete the necessary pods one at a time.
For example, if you had a Deployment called “web” with 3 replicas, and 1 of them is running on the node that will be deleted, and it is not acceptable to only run 2 replicas at any time, you would scale the Deployment up to 4 replicas, wait for the new pod to become Ready, and then delete the old pod using kubectl delete pod
. You can then scale back down to 3 replicas to get back to your original configuration.
kubectl scale deployment web --replicas=4
kubectl delete pod <podname>
kubectl scale deployment web --replicas=3
If your pods are controlled by a StatefulSet, first make sure that the pod that will be deleted can be safely deleted. How you do this depends on the pod and your application’s tolerance for one of the stateful pods to become temporarily unavailable. For example you might want to demote a MySQL or Redis writer to just a read-only slave, update and release application code to no longer reference the pod in question temporarily, or scale up the ReplicaSet first to handle the extra traffic that may be caused by one pod being unavailable. Once this is done, delete the pod and wait for its replacement to appear on another node.
Once you have finished maintenance on a node, you can use the kubectl
uncordon
command to allow scheduling on the node again. Then, as pods need to be scheduled, they will appear back on that node.
kubectl uncordon <nodename>
If you have either a recently uncordoned node in your cluster, or a new node in your cluster, you may be concerned with balancing some pods on that node. Usually it is best to allow the Kubernetes scheduler to assign pods as needed during your next Deployment/StatefulSet update, but you can also force the process by using some of the earlier steps to delete pods and get them scheduled elsewhere.
If you want finer control over pod scheduling, use node affinity and anti-affinity. Check out the Kubernetes documentation for more information on how to use it to help the scheduler set up your workload in the way you need.
As you're managing your cluster, ensure you don't run into any unforeseen errors or issues. Try Blue Matador free. It's like having a DevOps engineer that doesn't sleep. |