There are many times in OpenShift where microservices need to talk to each other internally without exposing routes to the outside world. These microservices interact via the Kubernetes service API, which acts as a load balancer that resolves a set of pods to a single IP.
While OpenShift and the service API work together to abstract most of the complexity behind networking, problems can still arise when trying to interact with other microservices deployed on OpenShift. In this post we’ll talk about some of the most common problems around internal networking and how to troubleshoot them. For the examples, we’ll assume a basic app consisting of a single ClusterIP Service and Deployment, each called hello-world.
The following service will be used as an example for each issue described in this post.
apiVersion: v1 kind: Service metadata: name: hello-world namespace: my-namespace spec: selector: app: hello-world ports: - name: https protocol: TCP port: 443 targetPort: 8443
How to Begin Troubleshooting
The best way to begin troubleshooting is to oc rsh into the pod that is making the call to your target service. For example, if I have a deployment called rhel that is attempting to talk to another deployment called hello-world, then I would want to access the rhel pod to begin troubleshooting my networking error. For example:
oc rsh rhel-2-fqjbw -n my-namespace
Once inside the pod, attempt running “curl -v” against the endpoint your pod is trying to reach. This will give verbose output and will often reveal the issue behind your networking error. The table below gives an overview of some sample curl -v output and some possible errors associated with it.
curl -v Output | Possible Errors |
---|---|
sh-4.2$ curl -v -k https://hello-world:443 Could not resolve host: hello-world; Unknown error Closing connection 0 curl: (6) Could not resolve host: hello-world; Unknown error | 1) Service does not exist 2) Target hostname is incorrect |
sh-4.2$ curl -v -k https://hello-world.my-namespace:443 About to connect() to hello-world.my-namespace port 443 (#0) Trying 172.30.209.206… Connection timed out Failed connect to hello-world.my-namespace:443; Connection timed out Closing connection 0 curl: (7) Failed connect to hello-world.my-namespace:443; Connection timed out | 1) Isolation policy is blocking traffic |
sh-4.2$ curl -v -k https://hello-world:443 About to connect() to hello-world port 443 (#0) Trying 172.30.250.96… No route to host Failed connect to hello-world:443; No route to host Closing connection 0 curl: (7) Failed connect to hello-world:443; No route to host | 1) Service selector is incorrect 2) Service port is incorrect 3) Service targetPort name is not specified on the deployment |
sh-4.2$ curl -v -k https://hello-world:443 About to connect() to hello-world port 443 (#0) Trying 10.128.2.44… Connection refused Failed connect to hello-world:443; Connection refused Closing connection 0 curl: (7) Failed connect to hello-world:443; Connection refused | 1) Service clusterIP is none 2) Service targetPort is incorrect 3) Container does not expose targetPort |
Note that this only covers some of the most common networking errors that I have observed and that your particular error may not be covered in this post.
Let’s begin looking at some of the most common errors around OpenShift networking.
Service does not Exist
First things first, you should have a service object to be able to load-balance between instances of your application. You can quickly create a service with the “oc expose” command:
oc expose deployment hello-world # For Deployment objects oc expose deploymentconfig hello-world # For Deployment Configs
Alternatively, you can use the hello-world service YAML at the beginning of this post as an example to help get started. Once you have written the YAML, create the service with:
oc apply -f $PATH_TO_SERVICE_YAML
Target Hostname is Incorrect
The most common networking issues are caused by attempting to reference an incorrect host name. The host name of an app will be determined by the name of its service. Depending on whether or not the source and target apps are in the same namespace, the target host name will be either <service-name> or <service-name>.<namespace-name>.
Source and Target Apps in the Same Namespace
If your source and target apps are in the same OpenShift namespace, then the target hostname will simply be the name of the target service. Using the hello-world service above as an example, any app trying to talk to the hello-world app would simply use the host name hello-world.
Source and Target Apps in Different Namespaces
The target host name will be a little different if the source and target apps live in different namespaces. In this case the target host name will be <service-name>.<namespace-name>. Using the hello-world service above as an example, any app trying to talk to the hello-world app from a different namespace would use the host name hello-world.my-namespace.
SDN Isolation Policy is Blocking Traffic
The OpenShift SDN supports three different modes for networking, with the default being network policy in OpenShift 4. It could be possible that your mode’s isolation policy has not been configured to allow traffic to reach your app.
If using network policy mode, ensure that a NetworkPolicy object has been created that allows traffic to reach your target app.
If using multitenant mode, ensure that your source and target apps’ namespaces have been joined together to allow network traffic.
Your pods should already be able to reach each other with subnet mode.
Service Selector is Incorrect
The most common way to route traffic with a service is to use a label selector that matches a label on the app’s pods. In the example service above, the hello-world service will route traffic to pods with a label app=hello-world. Make sure that the target Deployment or DeploymentConfig sets a label on each pod that matches the service selector and vice versa.
Here’s part of an example Deployment that sets the “app=hello-world” label that the service selector expects on each pod. Notice the “template.metadata.labels.app” value, which sets the pod “app=hello-world” label.
apiVersion: apps/v1 kind: Deployment metadata: name: hello-world spec: replicas: 1 selector: matchLabels: app: hello-world template: metadata: labels: app: hello-world spec: containers: - name: hello-world ...
Service clusterIP is None
Notice in the above hello-world service that the YAML specification lacks a clusterIP key-value pair. This means that OpenShift will automatically assign the hello-world service an IP address. Compare that to this modified service below, in which we set the clusterIP to “None”:
apiVersion: v1 kind: Service metadata: name: hello-world namespace: my-namespace spec: selector: app: hello-world ports: - name: https protocol: TCP port: 443 targetPort: 8443 clusterIP: None
This is actually a headless service, meaning that the service is not assigned an IP address. Headless services have many different use cases, but if you’re running a simple architecture with the intention of a Deployment or DeploymentConfig being load-balanced by a service, you may have accidentally created a headless service. Check that the service has a clusterIP allocated with “oc get svc hello-world -o yaml”. If you can verify that the clusterIP is None, delete the service and apply it again without a clusterIP spec.
Service Ports are Incorrect or are Not Exposed
Part of a service’s job is to identify the port and targetPorts of an application. The service will accept traffic to port “port” and will redirect to port “targetPort” on the running container. Another common issue around internal traffic is that these port values can be either incorrect or unexposed by the container.
Service Port is Incorrect
The hello-world service above specifies port 443 and targetPort 8443:
apiVersion: v1 kind: Service ... ports: - name: https protocol: TCP port: 443 targetPort: 8443 ...
This port, port 443, reroutes to port 8443 on the target container. Make sure that your requests are for port 443. Otherwise, the service will not be able to route your request to the targetPort of the container.
Service targetPort is Incorrect
Your service targetPort may be incorrect if your request is hitting the service port and is still failing. Make sure your service’s targetPort is specifying a port that is exposed by the container.
Service targetPort Name is not Specified on the Deployment
Imagine the hello-world service above exposed ports like this instead:
apiVersion: v1 kind: Service ... ports: ... - name: https protocol: TCP port: 443 targetPort: https
This port, port 443, is targeting a port called “https”. This targetPort name is referring to a port exposed on the Deployment or DeploymentConfig. Either resource will be expected to have a port specified to route to “targetPort: https”.
apiVersion: apps/v1 kind: Deployment metadata: name: hello-world spec: ... containers: ... ports: - containerPort: 8443 protocol: TCP name: https
Notice at the bottom of the deployment the “ports:” stanza. Since the service is referring to the “https” targetPort, the deployment must also have a corresponding “https” port to route traffic to the desired port. In this case “https” endpoint will accept traffic at port 443 and reroute to port 8443, which is specified by the containerPort of the Deployment.
If your service uses a name instead of a number for its targetPort, make sure that name is also specified on the Deployment object.
Container does not Expose Targetport
This is one that is often overlooked. Sometimes the issue is not an OpenShift problem but instead has to do with the running container. If you know that the port and targetPort are specified and configured properly, then the running container may not be exposing the targetPort.
Take the following Spring Boot application.properties, for example:
server.port=8444 ...
Given the hello-world service above, you can expect this application.properties config to be the root cause of your networking issue. The targetPort is set to 8443, but the container is actually exposing port 8444.
We can fix this by modifying the application.properties to instead read as:
server.port=8443 ...
Although this example was in Spring Boot, a similar troubleshooting approach can be taken with any runtime or application source. Make sure that your app is configured properly to expose the container specified by your service.
Thanks for Reading!
Hopefully this was able to help you troubleshoot any networking issues you’re experiencing in OpenShift. Although this did not cover every networking issue you could possibly experience, I think it covers the most common (and arguably the most frustrating) errors.
For more information on OpenShift networking and the Kubernetes service API, check out the following links. Until next time!