Debugging Kubernetes
Earlier this morning I was debugging what turned out to be a issue with the node configuration in my local k8 dev cluster and it got me thinking about how easy it is to generally debug these things ... so here we go!
Background
Last weekend I decided to increase the amount of storage available to the pods in my local dev cluster to ensure more ephemeral storage was available. At the same time I decided to add a new node to my cluster ... this is where it turns out the issue actually resided... PEBCAK!!!
Anyway, the project in question was a local Harbor install that I use for developing / debugging apps and services locally on my homelab.
Now then, Harbor is a complicated beast when it comes to running the standard helm chart and looks something like:
$ kubectl get pods --namespace=harbor
NAME READY STATUS RESTARTS AGE
harbor-core-6c57bb9c78-69w8r 1/1 Running 0 82m
harbor-database-0 1/1 Running 0 82m
harbor-jobservice-78567ccbb7-gmhtn 1/1 Running 0 82m
harbor-portal-68bcb5dd4c-4xmkw 1/1 Running 0 82m
harbor-redis-0 1/1 Running 0 82m
harbor-registry-7b8cbc8546-wwk6c 2/2 Running 0 82m
harbor-trivy-0 1/1 Running 0 82m
In essence, a lot of moving parts that all integrate to provide a rather sweet set of tools when it comes to providing a registry and image scanning tools.
The issue I was facing was that the harbor-core
pod could not speak to the harbor-redis
pod. To cut a long story short, the issue turned out to be related to firewalld
on the new node due to certain necessary ports not being allowed. Specifically, the ports 179/tcp (for BGP), 4789/udp and 8472/udp (for VXLAN), 30000-32767/tcp (for NodePort Services), 10255/tcp and 10250/tcp (for kubelet APIs) needed to be open. This was a configuration oversight and not an inherent issue with Harbor itself i.e. PEBCAK issue๐
The problem
Okay so faced with the problem
Back-off restarting failed container core in pod harbor-core...
As we all know the usual way to debug this is to check out the pod descriptors and logs to see if there's anything obvious going on.
$ kubectl describe pod harbor-core-6c57bb9c78-69w8r
...
...
$ kubectl logs harbor-core-6c57bb9c78-69w8r
...
2024-01-19 13:30:51
2024-01-19T13:30:51Z [ERROR] [/lib/cache/cache.go:124]: failed to ping redis://harbor-redis.harbor:6379/0?idle_timeout_seconds=60, retry after 723.423621ms : dial tcp: lookup harbor-redis.harbor: i/o timeout
...
Based on ... it's a connection issue between harbor-core
and harbor-redis
. Now the tricky part as you probably know is how to debug a connection issue when there are zero debug tools in 99% of the images i.e. nslookup, ping, curl, wget, redis-cli etc. etc.
How on earth to debug this?!?!?!
The solution
Well, after using docker for the past 5+ years, I've got a set of tools for debugging stuff like this...
Actually, finding the issue was reasonably straight forward and required me to deploy a new docker image into the namespace alongside the harbor-core and harbor-redis pods i.e. a debugger image that contains tools such as ping, curl, telnet, and more importantly redis-cli and nslookup / dig.
Historically, I leveraged a image on for all things networky called tutum/dsnutils which is basically an old, unmaintained docker image that contains dnsutils (nslookup and a few others).
It was at this point I decided to build a new and improved debug image that I can use with a variety of different scenarios and essentially contains a toolkit that consists of
- nslookup / dig - used for validating DNS settings
- mysql-client - used to connect to MySQL instances
- redis-cli - used to connect to Redis instances
- curl / wget - used to test http(s) connectivity
- telnet - general tool for testing interconnectivity
- +plus a few other useful tools
After about 10 mins I had an image built and deployed on https://hub.docker.com/r/gizzmoasus/debugger that I could pull into my harbor namespace and run some simple debug tests by leveraging the following simple deployment yaml:
apiVersion: v1
kind: Pod
metadata:
name: debugger
spec:
containers:
- name: debugger
image: gizzmoasus/debugger:latest
command:
- sleep
- "infinity"
imagePullPolicy: Always
restartPolicy: Always
$ kubectl get pods --namespace=harbor
NAME READY STATUS RESTARTS AGE
debugger 1/1 Running 0 121m
harbor-core-6c57bb9c78-69w8r 1/1 Running 0 196m
harbor-database-0 1/1 Running 0 196m
harbor-jobservice-78567ccbb7-gmhtn 1/1 Running 0 196m
harbor-portal-68bcb5dd4c-4xmkw 1/1 Running 0 196m
harbor-redis-0 1/1 Running 0 196m
harbor-registry-7b8cbc8546-wwk6c 2/2 Running 0 196m
harbor-trivy-0 1/1 Running 0 196m
Now I had the debugger image running in the same namespace as the harbor pods I can begin the task of figuring out the problem really is...
Funnily enough, the first command I ran allowed me to identify the issue was related to DNS ...
$ kubectl exec -it debugger -- nslookup harbor-redis
;; connection timed out; no servers could be reached
command terminated with exit code 1
This is a huuuuge red flag that points to a failure in the cluster DNS, so I went to the one place you typically go to for stuff like that this ... the kubernetes docs
The downside to these docs is they run through the happy path i.e. look at what you should see ๐ (flash backs to bullseye ... look at what you could have won).
Sooooo the issue is related to DNS so I switch namespaces and check the pods/services are up and running healthily:
$ kubectl get pods,svc --namespace=kube-system | grep dns
NAME READY STATUS RESTARTS AGE
coredns-76f75df574-6wr99 1/1 Running 0 74h4m
coredns-76f75df574-mjsrh 1/1 Running 0 74h4m
coredns-76f75df574-qfxp9 1/1 Running 0 74h4m
coredns-76f75df574-qsqqg 1/1 Running 0 74h4m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 17d
Looks like the coredns pods and services are running fine and there is indeed an IP allocated to the service.
Anyway, long story short I noticed that DNS was working fine on the other nodes and determined it was a firewall issue on the new node I added over the weekend (essentially by disabling firewalld, refresh the pod and everything worked fine).
The point to the tail was that having a solid approach to debugging and a solid toolchain for debugging is important to have when it comes to running apps and services in Kubernetes clusters.
Feel free to check out the docker hub repository and suggest additional tools that would be useful to add in to this image over at
To use the image simply checkout https://hub.docker.com/r/gizzmoasus/debugger.