Topic 5: Monitoring and Observability
Monitoring and observability are essential DevOps practices that help you understand the health, performance, and reliability of your applications and infrastructure. In this topic, you'll learn how to set up monitoring using Prometheus and visualize metrics with Grafana.
Why are monitoring and observability important for cloud-native applications?
Cloud-native applications are typically distributed, dynamic, and run across many services and environments. Monitoring and observability are critical because they:
- Help detect and resolve issues quickly, minimizing downtime.
- Provide visibility into system health, performance, and user experience.
- Enable proactive alerting and troubleshooting in complex, rapidly changing environments.
- Support scalability and reliability by identifying bottlenecks and failures.
- Allow teams to understand dependencies and interactions between microservices.
Without effective monitoring and observability, it becomes difficult to maintain, debug, and optimize cloud-native systems.
Study
- What is Monitoring and Observability in DevOps?
- Prometheus Overview
- Grafana Overview
- Prometheus + Grafana Integration
Key Concepts
- Metrics: Quantitative data about your systems (CPU, memory, requests, etc.)
- Alerting: Automated notifications based on metric thresholds
- Dashboards: Visual representations of metrics for quick insights
- Instrumentation: Adding code or exporters to expose metrics
Hands-on Tasks
1. Set Up Prometheus
-
Create a minimal
prometheus.yml
config:global:
scrape_interval: 15s
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090'] -
Install Prometheus using Docker:
docker run \
-p 9090:9090 \
-v /path/to/prometheus.yml:/etc/prometheus/prometheus.yml \
prom/prometheus -
Add your application's metrics endpoint to
static_configs
as needed.
2. Set Up Grafana
- Install Grafana using Docker:
docker run -d --name=grafana -p 3000:3000 grafana/grafana
- Access Grafana at http://localhost:3000 (default login:
admin
/admin
) - Add Prometheus as a data source (URL:
http://host.docker.internal:9090
orhttp://localhost:9090
) - Add and connect your cloud provider's metrics if applicable (e.g., AWS CloudWatch, Azure Monitor)
3. Create Dashboards
- Create a new dashboard and add panels using PromQL queries (e.g.,
up
,http_requests_total
) - Visualize metrics from your application or infrastructure
4. Instrument a Sample App
- For Node.js: Use prom-client to expose metrics
- For Python: Use prometheus_client
- Add the metrics endpoint to Prometheus config and visualize in Grafana
Test Your Knowledge
Use these prompts to test your understanding:
- What is the difference between monitoring and observability?
- How does Prometheus collect metrics from applications?
- What is PromQL and how is it used in Grafana dashboards?
- How would you set up alerting for high CPU usage using Prometheus?
- What are exporters in the context of Prometheus?
- How do you add a new data source in Grafana?
- What are some best practices for dashboard design?