Is Your Kubernetes Pod Failing to Start? Here’s a Command to Get Back on Your Feet

Photo by Jordan Madrid on Unsplash

Earlier, I wrote a post about how to troubleshoot errors in Kubernetes using a blocking command. This trick, however, only applied to CrashLoopBackoffs. Today, I want to talk about how you can get back on your feet if your Kubernetes pods fail to start due to other errors or non-ready statuses, such as ErrImagePull, Pending, and 0/1 Ready. To do this, you can use the kubectl describe command.

Let’s take a quick look at what this command displays that makes it a useful troubleshooting tool.

The “kubectl describe” Command

The kubectl describe command is like a close cousin to “kubectl get” in that they both fetch information about a particular resource. However, while “kubectl get” without arguments is used to return a list of existing resources, kubectl describe outputs additional details such as the resource’s related events. Below shows an example of running “kubectl describe” against a pod failing due to a CrashLoopBackoff error. The events occur at the very end of the output:

$ kubectl describe pod $CRASHLOOPBACKOFF_POD
...
Events:
  Type     Reason     Age               From               Message
  ----     ------     ----              ----               -------
  Normal   Scheduled  <unknown>         default-scheduler  Successfully assigned default/busybox-59b4cb4848-m8dsx to minikube
  Normal   Pulling    2s (x3 over 20s)  kubelet, minikube  Pulling image "docker.io/busybox:1.32"
  Normal   Pulled     2s (x3 over 19s)  kubelet, minikube  Successfully pulled image "docker.io/busybox:1.32"
  Normal   Created    2s (x3 over 19s)  kubelet, minikube  Created container busybox
  Normal   Started    2s (x3 over 19s)  kubelet, minikube  Started container busybox
  Warning  BackOff    1s (x3 over 17s)  kubelet, minikube  Back-off restarting failed container

See that last event? That’s telling you that your container is crashing. From there, you could use a blocking command to gain access to your pod’s terminal and find the root cause.

Let’s look at how “kubectl describe” can help solve other types of errors in Kubernetes.

Troubleshooting the “ErrImagePull” Status

As the name of this error status implies, ErrImagePull means that Kubernetes cannot pull the image you are trying to deploy. There are a variety of reasons why this might happen:

  • You need to provide credentials

  • A scanning tool is blocking your image

  • A firewall is blocking the desired registry

By using the “kubectl describe” command, you can remove much of the guessing involved and get right to the root cause. For example, imagine you wanted to pull this image from Red Hat:

registry.redhat.io/rhel8/httpd-24:1

Upon creating, you use “kubectl get” to see if the pod has started but see you have an ErrImagePull error:

→ kubectl get pods
NAME                     READY   STATUS         RESTARTS   AGE
httpd-5d5c4dbb5b-jsj87   0/1     ErrImagePull   0          1s

You can use the “kubectl describe” command, shown below, to find any relevant events:

→ kubectl describe pod httpd-5d5c4dbb5b-jsj87
...
Events:
  Type     Reason     Age                    From               Message
  ----     ------     ----                   ----               -------
  Normal   Scheduled  <unknown>              default-scheduler  Successfully assigned default/httpd-5d5c4dbb5b-jsj87 to minikube
  Warning  Failed     2m56s (x6 over 4m18s)  kubelet, minikube  Error: ImagePullBackOff
  Normal   Pulling    2m42s (x4 over 4m19s)  kubelet, minikube  Pulling image "registry.redhat.io/rhel8/httpd-24:1"
  Warning  Failed     2m42s (x4 over 4m18s)  kubelet, minikube  Failed to pull image "registry.redhat.io/rhel8/httpd-24:1": rpc error: code = Unknown desc = Error response from daemon: Get https://registry.redhat.io/v2/rhel8/httpd-24/manifests/1: unauthorized: Please login to the Red Hat Registry using your Customer Portal credentials. Further instructions can be found here: https://access.redhat.com/RegistryAuthentication
  Warning  Failed     2m42s (x4 over 4m18s)  kubelet, minikube  Error: ErrImagePull
  Normal   BackOff    2m26s (x7 over 4m18s)  kubelet, minikube  Back-off pulling image "registry.redhat.io/rhel8/httpd-24:1"

Anything look off? If you look at the fourth event from the top, you’ll see that the image failed to pull because authentication is required. To resolve this issue, you can create a pull secret using the “kubectl create secret docker-registry” command and add it to your service account’s list of pull secrets or add it directly to the deployment using the “imagePullSecrets” list.

Let’s look at another type of error that “kubectl describe” can help you solve.

Troubleshooting the “Pending” Status

One frustrating error that Kubernetes users encounter is when a pod sits indefinitely in the “Pending” state. Pending means that your container has not been created, and is waiting for a specific condition to be satisfied before scheduling can occur. You can find what this condition is by using the “kubectl describe” command.

Consider a deployment that has the following node selector:

nodeSelector:
  role: invalid

This node selector means that the Kubernetes scheduler will only place pods on nodes with the “role: invalid” label. If you try to deploy this and use “kubectl get” to see your pod’s status, you’ll see the pod stuck in the Pending state (unless you actually have a node with this label):

→ kubectl get pods
NAME                     READY   STATUS    RESTARTS   AGE
nginx-679c6f46b5-949j8   0/1     Pending   0          11s

If you were unsure why this pod is pending, you could find out with the “kubectl describe” command:

→ kubectl describe pod nginx-679c6f46b5-949j8
...
Events:
  Type     Reason            Age        From               Message
  ----     ------            ----       ----               -------
  Warning  FailedScheduling  <unknown>  default-scheduler  0/1 nodes are available: 1 node(s) didn't match node selector.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/1 nodes are available: 1 node(s) didn't match node selector.

It’s pending because the node selector is incorrect since it says that “0/1 nodes are unavailable: 1 node(s) didn’t match node selector”.

You’ll also see the Pending state when you try to schedule a pod that requests more resources than a single node has available in your Kubernetes cluster. In this case, the “kubectl describe” command will reveal a status like this:

→ kubectl describe pod nginx-77c59567c9-nj8qp
...
Events:
  Type     Reason            Age        From               Message
  ----     ------            ----       ----               -------
  Warning  FailedScheduling  <unknown>  default-scheduler  0/1 nodes are available: 1 Insufficient cpu.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/1 nodes are available: 1 Insufficient cpu.

As you can see, this particular pod was trying to request too many CPUs.

Let’s look at one last type of condition that “kubectl describe” can help resolve.

Troubleshooting the “0/1 Ready” Condition

The 0/1 Ready condition is when your pod remains stuck in an unready state. It isn’t an actual “status” like Pending or ErrImagePull, but it’s still something that often occurs when trying to deploy an app to Kubernetes. You can check if your pod is unready by using the “kubectl get pods” command and looking under the READY column.

→ kubectl get pods
NAME                     READY   STATUS    RESTARTS   AGE
nginx-5c56df8d7c-c86lw   0/1     Running   0          4s

Many times, this is normal. If your pod has a readiness probe defined, you can expect it to take some time before your pod becomes ready. Your pod, however, should not report 0/1 forever. If you find that your pod does not become ready in the time you expect, you can use “kubectl describe” to see if you have a failing readiness probe. Here’s an example:

→ kubectl describe pod nginx-5c56df8d7c-c86lw
...
Events:
  Type     Reason       Age                  From               Message
  ----     ------       ----                 ----               -------
  Normal   Scheduled    <unknown>            default-scheduler  Successfully assigned default/nginx-5c56df8d7c-c86lw to minikube
  Warning  FailedMount  113s                 kubelet, minikube  MountVolume.SetUp failed for volume "default-token-tf6sf" : failed to sync secret cache: timed out waiting for the condition
  Normal   Pulled       112s                 kubelet, minikube  Container image "docker.io/nginx:1.19" already present on machine
  Normal   Created      112s                 kubelet, minikube  Created container nginx
  Normal   Started      112s                 kubelet, minikube  Started container nginx
  Warning  Unhealthy    91s (x21 over 111s)  kubelet, minikube  Readiness probe failed: OCI runtime exec failed: exec failed: container_linux.go:349: starting container process caused "exec: \"invalid\": executable file not found in $PATH": unknown

As the last line indicates, this pod’s readiness probe keeps failing (21 times over the last 111 seconds), so it is not going to report as ready. This information would help you modify your readiness probe so that your pod can become available.

Thanks for Reading!

The kubectl describe command is an excellent tool to have in your arsenal of Kubernetes debugging tools. If you encounter a status or error condition and are unsure of the root cause, using the “kubectl describe” command to review that resource’s events will often point you in the right direction. Note that while this post focused on Pods specifically, you can use this command for any kind of resource you believe is failing. Applying this to troubleshoot Pods, however, is the most common use case. Hopefully, this little trick helps you get back on your feet faster when you encounter errors in Kubernetes!

Austin Dewey

Austin Dewey is a DevOps engineer focused on delivering a streamlined developer experience on cloud and container technologies. Austin started his career with Red Hat’s consulting organization, where he helped drive success at many different Fortune 500 companies by automating deployments on Red Hat’s Kubernetes-based PaaS, OpenShift Container Platform. Currently, Austin works at fintech startup Prime Trust, building automation to scale financial infrastructure and support developers on Kubernetes and AWS. Austin is the author of "Learn Helm", a book focused on packaging and delivering applications to Kubernetes, and he enjoys writing about open source technologies at his blog in his free time, austindewey.com.

Leave a Reply