Tutorial / Cram Notes
Amazon CloudWatch is a monitoring service that provides data and actionable insights to monitor applications, respond to system-wide performance changes, and optimize resource utilization. One of the primary features of CloudWatch is the alarm system. A CloudWatch Alarm watches a metric over a specified time period and performs one or more actions when the metric crosses a threshold value for a specified number of evaluation periods.
Here’s a basic example of creating a CloudWatch alarm that triggers when CPU utilization is too high:
{
"AlarmName": "High CPU Utilization",
"MetricName": "CPUUtilization",
"Namespace": "AWS/EC2",
"Statistic": "Average",
"Period": 300,
"EvaluationPeriods": 2,
"Threshold": 80,
"ComparisonOperator": "GreaterThanThreshold",
"AlarmActions": ["arn:aws:sns:us-west-2:123456789012:my-topic"],
"AlarmDescription": "Alarm when server CPU exceeds 80%"
}
Integration with Amazon SNS
Amazon Simple Notification Service (SNS) is a managed service that provides message delivery from publishers to subscribers. CloudWatch alarms can be set up to publish messages to an SNS topic. Subscribers to this topic can be end-users, applications, or other AWS services. This integration allows for immediate notifications when an alarm state is reached.
For instance, administrators can be instantly notified via email or SMS if an application’s response time crosses a predefined threshold.
AWS Lambda for Automated Actions
AWS Lambda allows you to run code without provisioning or managing servers. CloudWatch alarms can trigger a Lambda function to automatically carry out an action in response to an event. For example, if a web application becomes unresponsive, an alarm could trigger a Lambda function that restarts the EC2 instance hosting the application.
Here’s an example of how CloudWatch can trigger a Lambda function:
- Define the Lambda function that takes action based on CloudWatch event data.
- Set up a CloudWatch alarm with a Lambda function as the target action.
When the alarm’s conditions are met, it will invoke the designated Lambda function by passing event data as a parameter.
EC2 Automatic Recovery
Amazon EC2 instances can be automatically recovered if they become impaired due to an underlying hardware failure. This is accomplished through a combination of CloudWatch alarms and EC2 instance recovery action. When a status check alarm is triggered for an instance, the recovery action can automatically stop and restart the instance on new hardware.
An example of setting an EC2 automatic recovery can look something like the following:
{
"AlarmName": "ec2-recovery",
"Namespace": "AWS/EC2",
"MetricName": "StatusCheckFailed_System",
"Dimensions": [{"Name": "InstanceId", "Value": "i-1234567890abcdef0"}],
"Statistic": "Minimum",
"Period": 60,
"EvaluationPeriods": 3,
"Threshold": 1,
"ComparisonOperator": "GreaterThanThreshold",
"AlarmActions": ["arn:aws:automate:us-west-2:ec2:recover"],
"AlarmDescription": "Recover EC2 instance when system check fails"
}
Conclusion
In conclusion, the AWS Cloud ecosystem offers a powerful array of services for monitoring and automated response. CloudWatch alarms provide the basis for observing resources and triggering notifications or actions through SNS topics which can deliver messages to various endpoints. Automated actions via Lambda functions provide a way to execute code in response to alarms, and EC2 automatic recovery features help to ensure high availability and reliability of instances.
Understanding and effectively implementing these features is essential for AWS DevOps engineers, as they ensure proactive handling of events and streamline operational tasks in the cloud environment. This functionality underscores the agility and robustness of cloud-based infrastructure, and forms an essential topic for AWS Certified DevOps Engineer – Professional (DOP-C02) certification.
Practice Test with Explanation
True or False: AWS CloudWatch alarms can trigger Auto Scaling actions to scale EC2 instances based on defined thresholds.
- True
- False
Answer: True
AWS CloudWatch alarms can be set to trigger Auto Scaling actions for EC2 instances when specific thresholds are met, such as CPU utilization or network input/output.
What is the maximum frequency for CloudWatch metric data points for the Basic Monitoring of EC2 instances?
- 1 minute
- 5 minutes
- 10 minutes
- 15 minutes
Answer: 5 minutes
Basic Monitoring for EC2 instances (provided for free) includes metric data at 5-minute intervals.
True or False: An Amazon SNS topic can trigger a Lambda function.
- True
- False
Answer: True
An Amazon SNS topic can be used to trigger an AWS Lambda function. The function can be invoked by messages published to the SNS topic.
Which AWS service can be used to automatically recover an EC2 instance?
- EC2 Auto Scaling
- AWS Lambda
- CloudWatch Alarms
- AWS OpsWorks
Answer: CloudWatch Alarms
CloudWatch alarms can be configured to perform EC2 instance recovery actions when certain criteria are met.
True or False: AWS CloudWatch does not support custom metrics.
- True
- False
Answer: False
AWS CloudWatch supports custom metrics, allowing users to monitor application-specific metrics by publishing their own data points to CloudWatch.
Amazon CloudWatch can be used to monitor which of the following AWS resources? (Select TWO)
- DynamoDB Tables
- S3 Bucket storage
- VPCs
- EC2 Instances
- Lambda Functions
Answer: EC2 Instances, Lambda Functions
Amazon CloudWatch can monitor several AWS services including EC2 Instances and Lambda Functions. It does not directly monitor the network performance of VPCs, but it can monitor DynamoDB and S3 Buckets.
True or False: Amazon SNS supports message delivery over protocols such as SMS, email, and HTTP/S.
- True
- False
Answer: True
Amazon SNS supports multiple messaging protocols including SMS, email, Amazon SQS, HTTP/S, and Lambda, among others.
Which AWS feature can be used to ensure high availability by distributing incoming application traffic across multiple EC2 instances?
- AWS Lambda
- Elastic Load Balancing
- AWS Elastic Beanstalk
- Amazon ECS
Answer: Elastic Load Balancing
Elastic Load Balancing (ELB) automatically distributes incoming application traffic across multiple targets, such as EC2 instances, to ensure high availability.
True or False: You cannot create a CloudWatch alarm based on a metric associated with an Amazon EBS volume.
- True
- False
Answer: False
CloudWatch alarms can be set on metrics associated with an Amazon EBS volume, such as disk read/write operations or throughput.
When using Amazon SNS, what is the format of the messages being sent to subscribers?
- Binary
- JSON
- Plain Text
- XML
Answer: JSON
Amazon SNS messages are formatted in JSON, which allows for structured data and is compatible with a wide range of services and programming languages.
True or False: In AWS, you can set up an alarm to monitor the estimated charges for your AWS account.
- True
- False
Answer: True
AWS CloudWatch provides functionality to monitor the estimated charges on your AWS account and can trigger an alarm if the charges exceed certain thresholds.
Which AWS service allows for automatic scaling of your Amazon EC2 capacity up or down according to conditions you define?
- AWS Lambda
- CloudWatch Alarms
- EC2 Auto Scaling
- AWS CloudFormation
Answer: EC2 Auto Scaling
EC2 Auto Scaling helps you maintain application availability and allows you to automatically add or remove EC2 instances according to conditions you define.
Interview Questions
Can you explain what Amazon CloudWatch is and how it integrates with Amazon SNS for alert notifications?
Amazon CloudWatch is a monitoring and observability service that provides data and actionable insights for AWS resources. It can collect and track metrics, collect and monitor log files, and set alarms. When integrated with Amazon SNS (Simple Notification Service), CloudWatch can send notifications or automatically make changes to the resources when certain thresholds are breached or alarms are triggered. CloudWatch alarms can publish messages to SNS topics, which can then trigger notifications to subscribers or other supported AWS services like Lambda for automated responses.
How do you set up a CloudWatch alarm to trigger an automatic EC2 instance recovery action?
In CloudWatch, you create an alarm that monitors the health of a specific EC2 instance. You can specify the action to recover the instance when the alarm is in the ‘ALARM’ state. To do this, you choose the “Take the recovery action” option as the alarm action and select “Recover this instance” for an EC2 status check alarm.
What is AWS Lambda, and how can it be used in response to CloudWatch alarms?
AWS Lambda is a compute service that lets you run code without provisioning or managing servers. In response to CloudWatch alarms, Lambda functions can be triggered to perform automated tasks or remediations, such as resizing an EC2 instance, clearing disk space, or restarting services. This is done by associating a CloudWatch alarm with an SNS topic that triggers the Lambda function.
Can you describe the difference between Amazon CloudWatch logs and events, and how each might trigger a different automated action?
CloudWatch Logs are used to monitor, store, and access log files from EC2 instances, AWS CloudTrail, and other sources. Logs can be analyzed and filtered, and metric filters can trigger alarms. CloudWatch Events, now part of Amazon EventBridge, deliver a stream of system events that describe changes in AWS resources. Both logs and events can trigger automated actions, but the use depends on the context; logs are typically used for deeper analysis requiring log data, while events are used for real-time monitoring of resource changes.
What is the difference between standard and FIFO queues in Amazon SNS, and how could this impact alarm notifications?
Amazon SNS has two types of queues: Standard and FIFO (First-In-First-Out). Standard queues offer maximum throughput, best-effort ordering, and at-least-once delivery. FIFO queues ensure messages are processed exactly once and in the exact order they are sent. This distinction can impact alarm notifications because standard queues may result in out-of-order alarm messages or duplicates, which might complicate the downstream processing or automation logic. FIFO queues provide a more strict sequence, making them suitable for applications where the order and accuracy of the notifications are critical.
Describe an appropriate use-case for an Amazon SNS dead-letter queue when dealing with notification failure.
A dead-letter queue is used to collect messages that could not be successfully delivered to subscribers of an SNS topic. An appropriate use-case for an SNS dead-letter queue is when critical notifications such as alerts or alarms must not be lost due to delivery failures. It allows you to perform post-failure analysis and reprocessing to ensure that not a single important notification is missed.
How do you ensure that a specific AWS Lambda function is triggered only by a particular CloudWatch alarm?
To ensure a specific Lambda function is triggered only by a particular CloudWatch alarm, you can create an SNS topic dedicated to that alarm and a subscription that invokes the Lambda function. Restrict the Lambda function’s execution role to allow invocations only from that SNS topic. Finally, configure the CloudWatch alarm to publish to the dedicated SNS topic upon alarm state changes.
How can you temporarily disable a CloudWatch alarm without deleting it?
You can temporarily disable a CloudWatch alarm by modifying the alarm state to ‘Disable Actions’. This way, even if the alarm’s conditions are met, it will not perform any actions. Alternatively, you can disable specific actions associated with the alarm individually if the alarm has multiple actions configured.
Explain a scenario where you would configure AWS CloudWatch to take snapshots of an EBS volume on an alarm trigger.
You might configure CloudWatch to take snapshots of an EBS volume when monitoring for a specific event that suggests potential data corruption, unexpected application crash, or other critical issues. When an alarm triggers on low disk space or high-latency I/O, a snapshot can be taken as a precautionary backup before any automated scaling or recovery actions are performed. This could be done by using SNS to trigger a Lambda function which then initiates the EBS snapshot creation.
How can you set up a CloudWatch alarm that notifies you when your estimated AWS charges exceed a certain threshold?
To set up an alarm for estimated AWS charges, first, you must enable monitoring of estimated AWS charges by activating the relevant setting in the Billing and Cost Management Console. Then, you create a CloudWatch alarm that watches the metric “EstimatedCharges” and specifies the threshold amount. Once the estimated charges exceed your defined limit, CloudWatch will trigger the alarm, and a notification can be issued through an SNS topic.
This blog post is really helpful for understanding AWS alert notifications and actions capabilities.
Could someone please explain how to set up CloudWatch Alarms to trigger an Amazon SNS topic?
AWS Lambda is my go-to for automated responses to alerts. Anyone else loving this integration?
The EC2 automatic recovery feature has saved me multiple times. Very reliable.
Thanks for the detailed guide, really appreciated.
Is there a way to filter which instances an auto-recovery action will apply to?
Fantastic post. Clears up many of my doubts.
I find the Amazon SNS integration with CloudWatch alarms very flexible for multi-tenant systems.