How to Start Troubleshooting CrashLoopBackoff Errors in Kubernetes Using a Blocking Command

Photo by Nick Jio on Unsplash

Everyone who has worked with Kubernetes has seen that awful status before – CrashLoopBackoff. A CrashLoopBackoff indicates that the process running in your container is failing. Your container’s process could fail for a variety of reasons. Perhaps you are trying to run a server that is failing to load a configuration file. Or, maybe you are trying to deploy an application that fails due to being unable to reach another service.

In an attempt to recover from CrashLoopBackoff errors, Kubernetes will continuously restart the pod, but often there is something fundamentally wrong with your process, and a simple restart will not work. Most times, you need to correct something with your image or the application that you are trying to run.

One quick way you can begin troubleshooting a CrashLoopBackoff error is to bypass this error in a separate deployment using a blocking command. The new deployment will still use your image, but you’ll override the command with a blocking command such as sleep infinity. Doing this will allow the pod to run persistently and will enable you to access the pod’s terminal so you can troubleshoot.

Imagine you were trying to deploy a Wildfly instance but were getting a CrashLoopBackoff. You could create a deployment similar to the following to create a persistently running Wildfly pod for troubleshooting purposes:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: wildfly-test
  namespace: test-ns
spec:
  replicas: 1
  selector:
    matchLabels:
      app: wildfly-test
  strategy:
    type: RollingUpdate
  template:
    metadata:
      labels:
        app: wildfly-test
    spec:
      containers:
        - image: docker.io/jboss/wildfly:20.0.0.Final
          name: wildfly
          command: ["sleep", "infinity"]

Notice the highlighted command above. This is the blocking command that will allow your container to run persistently and bypass the CrashLoopBackoff. Once the pod is up and running, you can access the terminal using the kubectl exec command, as shown:

kubectl exec -it deploy/wildfly-test -n test-ns -- /bin/bash

You can begin troubleshooting once you’ve accessed the terminal. Here are some common issues that I have solved using this method that might apply to your use case and help you understand where to start:

  • Many errors I have experienced had to do with environment variables that were unset or incorrect. I often use the env command to inspect environment variables that my application or process expects and make sure that they are correct.

  • Sometimes, an application may be unable to access other services. If I know that my application needs to access another service or endpoint but have a suspicion that this call is failing, I may try to “curl” it manually inside the pod. I usually use curl -v so that I get verbose output. Often, I either get a timeout or an x.509 insecure error when troubleshooting networking issues, which usually ends up being the root cause.

  • An application may fail to start due to being misconfigured or due to a missing configuration file. I troubleshoot this issue by inspecting the locations that I expect my application’s files to be with tools like ls, find, cat, and less. Using “ls” and “find” help make sure that a file exists. Using “cat” and “less” is helpful to inspect files and check that they are not misconfigured.

Often when troubleshooting a CrashLoopBackoff error, the application logs are also revealing. Use this command to check the logs:

kubectl logs -f deploy/$APPLICATION -n $NAMESPACE

Watch out for any errors, warnings, or stack traces. Take note of these so that you can focus on these particular issues when you troubleshoot the CrashLoopBackoff error inside your pod’s terminal.

Thanks for Reading!

Try using this trick next time you encounter a CrashLoopBackoff error. Use a blocking command like “sleep infinity” to bypass the CrashLoopBackoff and gain entry to your pod. Once inside, you’ll be able to inspect your pod in greater detail to help determine the root cause of your CrashLoopBackoff issue.

Austin Dewey

Austin Dewey is a DevOps engineer focused on delivering a streamlined developer experience on cloud and container technologies. Austin started his career with Red Hat’s consulting organization, where he helped drive success at many different Fortune 500 companies by automating deployments on Red Hat’s Kubernetes-based PaaS, OpenShift Container Platform. Currently, Austin works at fintech startup Prime Trust, building automation to scale financial infrastructure and support developers on Kubernetes and AWS. Austin is the author of "Learn Helm", a book focused on packaging and delivering applications to Kubernetes, and he enjoys writing about open source technologies at his blog in his free time, austindewey.com.

Leave a Reply