AWS CloudWatch: Monitoring, Metrics & Alerts

A hands-on learning from my AWS journey where I explored how to monitor EC2 instances using CloudWatch, simulate real CPU load, and trigger email alerts using SNS when thresholds are crossed.

After learning how to deploy servers using EC2 and manage traffic using load balancers, the next important step is monitoring. In real-world systems, deployment is only the beginning. The real challenge is knowing when something goes wrong.

What is AWS CloudWatch?

CloudWatch is a monitoring service that collects metrics, logs, and events from AWS resources. It automatically tracks performance data such as CPU usage, network activity, and disk operations.

CloudWatch = Monitor + Alert system for your infrastructure

Understanding Metrics

Metrics are numerical data points collected over time. For EC2, common metrics include CPU utilization, network traffic, and disk operations.

CPU Utilization
Network In / Out
Disk Read / Write
        

By default, CloudWatch collects data every 5 minutes. If needed, detailed monitoring can be enabled to collect data every minute.

Generating Load Using Stress Tool

To simulate real-world conditions, I installed a tool called stress on my EC2 instance. This tool artificially increases CPU usage so we can observe how CloudWatch reacts.

dnf install stress -y
stress -c 4 -t 60
        

I also created a small script to continuously generate load at random intervals. This helped in creating a realistic CPU usage pattern instead of a constant spike.

Watching Metrics in Real Time

After running the stress tool, the CPU utilization graph in CloudWatch started showing spikes. This is normal behavior. In real systems, CPU usage goes up and down depending on workload.

The real problem is not spikes — it is when CPU stays high for a long time.

Creating CloudWatch Alarm

Since no one can monitor graphs 24/7, CloudWatch allows setting alarms. I created an alarm with the condition:

CPU Utilization >= 60% for 1 minute
        

When this condition is met, CloudWatch changes state from OK → ALARM.

Integrating with SNS

To receive notifications, I integrated the alarm with SNS (Simple Notification Service). SNS sends email alerts whenever the alarm is triggered.

EC2 → CloudWatch → Alarm → SNS → Email
        

After confirming my email subscription, I started receiving alerts when CPU usage crossed the threshold.

Alarm States

Real DevOps Insight

This lab showed me something important: monitoring is not about watching dashboards — it is about getting notified automatically when something goes wrong.

Good systems don’t wait for humans to check them — they alert humans.

What I Learned

Final Thoughts

This was one of the most practical lessons in my AWS journey. It moved me from simply deploying infrastructure to actually operating it. CloudWatch made it clear that real systems must be monitored, measured, and automated.

← Back to Blogs