Development

Bob’s Journey into Application Observability: Uncovering Hidden Threats

Bob dives into Kubernetes application observability, learning how to use logs to monitor performance and uncover hidden threats. With Josie's guidance, he discovers the power of metrics and visualisation tools to keep his apps running smoothly.

Matt

Nov 21, 2024 • 4 min read

Bob had come a long way in his Kubernetes journey. From securing Docker images to locking down Pod traffic with Network Policies, he had built a robust foundation for his applications. But as he watched his cluster hum along, a question lingered in his mind.

"How will I know if something goes wrong?"

Josie, ever ready to guide, appeared with a thoughtful smile. "Good question, Bob. Securing workloads is only half the battle. Observability is how you keep watch over your cluster and applications".

"Observability?" Bob asked.

"It's more than just logs" Josie explained, "It's how you gain a clear understanding of what’s happening inside your system... logs, metrics, traces. Together, they're the eyes and ears of your cluster."

📂 Code Repository: Explore the complete code and configurations for this article on GitHub.

View Repository on GitHub

What Is Observability?

Josie described observability as the ability to measure the internal state of a system by examining its outputs.

"Think of it as the health dashboard for your application," she explained. "When done right, it helps you detect issues, optimise performance, and even uncover security threats", grabbing her notepad Josie draws out the 3 pillars

Observability rests on three pillars:

Logs: Capturing granular details about application behaviour
Metrics: Aggregating key statistics over time, such as request latency and CPU usage
Traces: Visualising the flow of requests across distributed systems

"Each pillar serves a purpose" Josie said. "Together, they give you a full picture of your application’s health."

Why Observability Matters

Josie highlighted why observability is critical for any Kubernetes deployment:

Proactive Problem Detection: Spot anomalies like high memory usage or slow response times before they escalate
Security Insights: Identify potential breaches or unauthorised access patterns
Optimisation: Fine-tune resource usage and improve application performance

"With observability, you're not just reacting to problems; you're preventing them".

Logs: The First Line of Defence

"Logs are the most familiar to developers," Josie said, pulling up an example on her laptop. "They record detailed information about events in your application, like errors, warnings, and informational messages."

"Logs are automatically centralised by Alex’s platforms team," Josie explained. "For you, it's as simple as checking the logs with kubectl logs -f during development or viewing them in Grafana for a consolidated view of application behaviour. No setup needed!"

Metrics: Watching the Bigger Picture

Josie moved on to metrics, which provide aggregated views of application and cluster performance.

"Grafana is your window into application performance, Bob. You can create panels to monitor metrics like CPU, memory, network, disk usage, latency, and even Apdex scores to measure user satisfaction. These metrics give you a clear, actionable view of your app's health."

Josie opens up Grafana and pulls up an example dashboard:

This dashboard combines metrics like CPU and latency with insights from logs and traces to give a full picture of app performance

"Something like this would help you observe how well your app is performing" she says. "It includes, the basic CPU/Memory usage as well as the network traffic/latency and an Apdex rating so you can see how well your app is responding to users ... not very well based on this example 😅"

Traces: End-to-End Visibility

Josie saved the most exciting pillar for last: traces.

"Tracing lets you follow a request through your system," she explained. "It's invaluable for identifying bottlenecks in microservices."

Setting Up Tracing

Bob used OpenTelemetry to instrument his app for distributed tracing:

const tracer = require('@opentelemetry/api').trace.getTracer('example-app');
const span = tracer.startSpan('http_request');
span.end();

"With traces, you can see exactly where your system is slowing down," Josie added.

💡

To the Reader:

If you're not already using tools like Loki, Grafana or Open Telemetry, don't worry! We'll cover these in a separate article on setting up a robust observability stack.

Combining the Pillars

Josie helped Bob combine all three pillars into a cohesive observability setup:

Metrics showed a spike in latency
Traces pinpointed the backend service causing delays
Logs revealed a misconfigured database query

"Observability isn’t just about finding problems," Josie said. "It's about understanding your system deeply."

Putting it all together

To illustrate, Josie walked Bob through a real-world scenario:

Problem Detected: Metrics showed an increase in request latency
Root Cause Found: Traces revealed a bottleneck in the backend
Issue Confirmed: Logs identified an outdated library causing the slowdown

"Without observability, you'd be shooting in the dark" Josie said.

Bob reflected on what he’d learned, jotting down notes:

Logs capture granular details for debugging
Metrics provide a high-level view of trends
Traces link everything together, revealing root causes
Observability is critical for proactive problem detection, optimisation, and security

Josie smiled. "With these tools, you can uncover hidden threats and keep your cluster running smoothly. You can also deploy confidently, knowing that any issues can be quickly identified and addressed."

"What's next?" Bob asked eagerly.

"Well, now that you can observe your cluster's behaviour, the next step is to ensure your workloads are actively protected against real-time threats" Josie responds.