After learning how to deploy servers using EC2 and manage traffic using load balancers, the next important step is monitoring. In real-world systems, deployment is only the beginning. The real challenge is knowing when something goes wrong.
What is AWS CloudWatch?
CloudWatch is a monitoring service that collects metrics, logs, and events from AWS resources. It automatically tracks performance data such as CPU usage, network activity, and disk operations.
Understanding Metrics
Metrics are numerical data points collected over time. For EC2, common metrics include CPU utilization, network traffic, and disk operations.
CPU Utilization
Network In / Out
Disk Read / Write
By default, CloudWatch collects data every 5 minutes. If needed, detailed monitoring can be enabled to collect data every minute.
Generating Load Using Stress Tool
To simulate real-world conditions, I installed a tool called stress on my EC2 instance. This tool artificially increases CPU usage so we can observe how CloudWatch reacts.
dnf install stress -y
stress -c 4 -t 60
I also created a small script to continuously generate load at random intervals. This helped in creating a realistic CPU usage pattern instead of a constant spike.
Watching Metrics in Real Time
After running the stress tool, the CPU utilization graph in CloudWatch started showing spikes. This is normal behavior. In real systems, CPU usage goes up and down depending on workload.
Creating CloudWatch Alarm
Since no one can monitor graphs 24/7, CloudWatch allows setting alarms. I created an alarm with the condition:
CPU Utilization >= 60% for 1 minute
When this condition is met, CloudWatch changes state from OK → ALARM.
Integrating with SNS
To receive notifications, I integrated the alarm with SNS (Simple Notification Service). SNS sends email alerts whenever the alarm is triggered.
EC2 → CloudWatch → Alarm → SNS → Email
After confirming my email subscription, I started receiving alerts when CPU usage crossed the threshold.
Alarm States
- OK → everything normal
- ALARM → threshold crossed
- INSUFFICIENT DATA → not enough data yet
Real DevOps Insight
This lab showed me something important: monitoring is not about watching dashboards — it is about getting notified automatically when something goes wrong.
What I Learned
- CloudWatch automatically collects metrics
- Stress testing helps simulate real-world load
- Alarms detect abnormal behavior
- SNS sends real-time notifications
- Monitoring is critical for production systems
Final Thoughts
This was one of the most practical lessons in my AWS journey. It moved me from simply deploying infrastructure to actually operating it. CloudWatch made it clear that real systems must be monitored, measured, and automated.
← Back to Blogs