Topic 5: Monitoring and Observability

Monitoring and observability are essential DevOps practices that help you understand the health, performance, and reliability of your applications and infrastructure. In this topic, you'll learn how to set up monitoring using Prometheus and visualize metrics with Grafana.

Why are monitoring and observability important for cloud-native applications?

Cloud-native applications are typically distributed, dynamic, and run across many services and environments. Monitoring and observability are critical because they:

Help detect and resolve issues quickly, minimizing downtime.
Provide visibility into system health, performance, and user experience.
Enable proactive alerting and troubleshooting in complex, rapidly changing environments.
Support scalability and reliability by identifying bottlenecks and failures.
Allow teams to understand dependencies and interactions between microservices.

Without effective monitoring and observability, it becomes difficult to maintain, debug, and optimize cloud-native systems.

Study

What is Monitoring and Observability in DevOps?
Prometheus Overview
Grafana Overview
Prometheus + Grafana Integration

Key Concepts

Metrics: Quantitative data about your systems (CPU, memory, requests, etc.)
Alerting: Automated notifications based on metric thresholds
Dashboards: Visual representations of metrics for quick insights
Instrumentation: Adding code or exporters to expose metrics

Hands-on Tasks

1. Set Up Prometheus

Create a minimal prometheus.yml config:

global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

Install Prometheus using Docker:

  docker run \
  -p 9090:9090 \
  -v /path/to/prometheus.yml:/etc/prometheus/prometheus.yml \
  prom/prometheus

Add your application's metrics endpoint to static_configs as needed.

2. Set Up Grafana

Install Grafana using Docker:

docker run -d --name=grafana -p 3000:3000 grafana/grafana

Access Grafana at http://localhost:3000 (default login: admin / admin)
Add Prometheus as a data source (URL: http://host.docker.internal:9090 or http://localhost:9090)
Add and connect your cloud provider's metrics if applicable (e.g., AWS CloudWatch, Azure Monitor)

3. Create Dashboards

Create a new dashboard and add panels using PromQL queries (e.g., up, http_requests_total)
Visualize metrics from your application or infrastructure

4. Instrument a Sample App

For Node.js: Use prom-client to expose metrics
For Python: Use prometheus_client
Add the metrics endpoint to Prometheus config and visualize in Grafana

Test Your Knowledge

Use these prompts to test your understanding:

What is the difference between monitoring and observability?
How does Prometheus collect metrics from applications?
What is PromQL and how is it used in Grafana dashboards?
How would you set up alerting for high CPU usage using Prometheus?
What are exporters in the context of Prometheus?
How do you add a new data source in Grafana?
What are some best practices for dashboard design?

Why are monitoring and observability important for cloud-native applications?​

Study​

Key Concepts​

Hands-on Tasks​

1. Set Up Prometheus​

2. Set Up Grafana​

3. Create Dashboards​

4. Instrument a Sample App​

Test Your Knowledge​

Resources​