Earlier, I wrote a post about how to troubleshoot errors in Kubernetes using a blocking command. This trick, however, only applied to CrashLoopBackoffs. Today, I want to talk about how you can get back on your feet if your Kubernetes pods fail to start due to other errors or non-ready statuses, such as ErrImagePull, Pending, and 0/1 Ready. To do this, you can use the kubectl describe command.
Let’s take a quick look at what this command displays that makes it a useful troubleshooting tool.
The “kubectl describe” Command
The kubectl describe command is like a close cousin to “kubectl get” in that they both fetch information about a particular resource. However, while “kubectl get” without arguments is used to return a list of existing resources, kubectl describe outputs additional details such as the resource’s related events. Below shows an example of running “kubectl describe” against a pod failing due to a CrashLoopBackoff error. The events occur at the very end of the output:
$ kubectl describe pod $CRASHLOOPBACKOFF_POD ... Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled <unknown> default-scheduler Successfully assigned default/busybox-59b4cb4848-m8dsx to minikube Normal Pulling 2s (x3 over 20s) kubelet, minikube Pulling image "docker.io/busybox:1.32" Normal Pulled 2s (x3 over 19s) kubelet, minikube Successfully pulled image "docker.io/busybox:1.32" Normal Created 2s (x3 over 19s) kubelet, minikube Created container busybox Normal Started 2s (x3 over 19s) kubelet, minikube Started container busybox Warning BackOff 1s (x3 over 17s) kubelet, minikube Back-off restarting failed container
See that last event? That’s telling you that your container is crashing. From there, you could use a blocking command to gain access to your pod’s terminal and find the root cause.
Let’s look at how “kubectl describe” can help solve other types of errors in Kubernetes.
Troubleshooting the “ErrImagePull” Status
As the name of this error status implies, ErrImagePull means that Kubernetes cannot pull the image you are trying to deploy. There are a variety of reasons why this might happen:
-
You need to provide credentials
-
A scanning tool is blocking your image
-
A firewall is blocking the desired registry
By using the “kubectl describe” command, you can remove much of the guessing involved and get right to the root cause. For example, imagine you wanted to pull this image from Red Hat:
registry.redhat.io/rhel8/httpd-24:1
Upon creating, you use “kubectl get” to see if the pod has started but see you have an ErrImagePull error:
→ kubectl get pods NAME READY STATUS RESTARTS AGE httpd-5d5c4dbb5b-jsj87 0/1 ErrImagePull 0 1s
You can use the “kubectl describe” command, shown below, to find any relevant events:
→ kubectl describe pod httpd-5d5c4dbb5b-jsj87 ... Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled <unknown> default-scheduler Successfully assigned default/httpd-5d5c4dbb5b-jsj87 to minikube Warning Failed 2m56s (x6 over 4m18s) kubelet, minikube Error: ImagePullBackOff Normal Pulling 2m42s (x4 over 4m19s) kubelet, minikube Pulling image "registry.redhat.io/rhel8/httpd-24:1" Warning Failed 2m42s (x4 over 4m18s) kubelet, minikube Failed to pull image "registry.redhat.io/rhel8/httpd-24:1": rpc error: code = Unknown desc = Error response from daemon: Get https://registry.redhat.io/v2/rhel8/httpd-24/manifests/1: unauthorized: Please login to the Red Hat Registry using your Customer Portal credentials. Further instructions can be found here: https://access.redhat.com/RegistryAuthentication Warning Failed 2m42s (x4 over 4m18s) kubelet, minikube Error: ErrImagePull Normal BackOff 2m26s (x7 over 4m18s) kubelet, minikube Back-off pulling image "registry.redhat.io/rhel8/httpd-24:1"
Anything look off? If you look at the fourth event from the top, you’ll see that the image failed to pull because authentication is required. To resolve this issue, you can create a pull secret using the “kubectl create secret docker-registry” command and add it to your service account’s list of pull secrets or add it directly to the deployment using the “imagePullSecrets” list.
Let’s look at another type of error that “kubectl describe” can help you solve.
Troubleshooting the “Pending” Status
One frustrating error that Kubernetes users encounter is when a pod sits indefinitely in the “Pending” state. Pending means that your container has not been created, and is waiting for a specific condition to be satisfied before scheduling can occur. You can find what this condition is by using the “kubectl describe” command.
Consider a deployment that has the following node selector:
nodeSelector:
role: invalid
This node selector means that the Kubernetes scheduler will only place pods on nodes with the “role: invalid” label. If you try to deploy this and use “kubectl get” to see your pod’s status, you’ll see the pod stuck in the Pending state (unless you actually have a node with this label):
→ kubectl get pods NAME READY STATUS RESTARTS AGE nginx-679c6f46b5-949j8 0/1 Pending 0 11s
If you were unsure why this pod is pending, you could find out with the “kubectl describe” command:
→ kubectl describe pod nginx-679c6f46b5-949j8 ... Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling <unknown> default-scheduler 0/1 nodes are available: 1 node(s) didn't match node selector. Warning FailedScheduling <unknown> default-scheduler 0/1 nodes are available: 1 node(s) didn't match node selector.
It’s pending because the node selector is incorrect since it says that “0/1 nodes are unavailable: 1 node(s) didn’t match node selector”.
You’ll also see the Pending state when you try to schedule a pod that requests more resources than a single node has available in your Kubernetes cluster. In this case, the “kubectl describe” command will reveal a status like this:
→ kubectl describe pod nginx-77c59567c9-nj8qp ... Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling <unknown> default-scheduler 0/1 nodes are available: 1 Insufficient cpu. Warning FailedScheduling <unknown> default-scheduler 0/1 nodes are available: 1 Insufficient cpu.
As you can see, this particular pod was trying to request too many CPUs.
Let’s look at one last type of condition that “kubectl describe” can help resolve.
Troubleshooting the “0/1 Ready” Condition
The 0/1 Ready condition is when your pod remains stuck in an unready state. It isn’t an actual “status” like Pending or ErrImagePull, but it’s still something that often occurs when trying to deploy an app to Kubernetes. You can check if your pod is unready by using the “kubectl get pods” command and looking under the READY column.
→ kubectl get pods NAME READY STATUS RESTARTS AGE nginx-5c56df8d7c-c86lw 0/1 Running 0 4s
Many times, this is normal. If your pod has a readiness probe defined, you can expect it to take some time before your pod becomes ready. Your pod, however, should not report 0/1 forever. If you find that your pod does not become ready in the time you expect, you can use “kubectl describe” to see if you have a failing readiness probe. Here’s an example:
→ kubectl describe pod nginx-5c56df8d7c-c86lw ... Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled <unknown> default-scheduler Successfully assigned default/nginx-5c56df8d7c-c86lw to minikube Warning FailedMount 113s kubelet, minikube MountVolume.SetUp failed for volume "default-token-tf6sf" : failed to sync secret cache: timed out waiting for the condition Normal Pulled 112s kubelet, minikube Container image "docker.io/nginx:1.19" already present on machine Normal Created 112s kubelet, minikube Created container nginx Normal Started 112s kubelet, minikube Started container nginx Warning Unhealthy 91s (x21 over 111s) kubelet, minikube Readiness probe failed: OCI runtime exec failed: exec failed: container_linux.go:349: starting container process caused "exec: \"invalid\": executable file not found in $PATH": unknown
As the last line indicates, this pod’s readiness probe keeps failing (21 times over the last 111 seconds), so it is not going to report as ready. This information would help you modify your readiness probe so that your pod can become available.
Thanks for Reading!
The kubectl describe command is an excellent tool to have in your arsenal of Kubernetes debugging tools. If you encounter a status or error condition and are unsure of the root cause, using the “kubectl describe” command to review that resource’s events will often point you in the right direction. Note that while this post focused on Pods specifically, you can use this command for any kind of resource you believe is failing. Applying this to troubleshoot Pods, however, is the most common use case. Hopefully, this little trick helps you get back on your feet faster when you encounter errors in Kubernetes!