Tutorial / Cram Notes
Storage options on AWS are crucial for AWS Certified Solutions Architect – Professionals (SAP-C02) to understand, as they form the backbone of cloud solutions designed on the AWS platform. AWS provides a variety of storage options to meet different use cases, performance characteristics, durability, and cost efficiency. These options can be broadly classified into object storage, block storage, file storage, and storage services that support specific application needs.
Object Storage: Amazon S3 and Amazon S3 Glacier
Amazon S3 (Simple Storage Service):
- It is designed for 99.999999999% (11 9’s) of durability and stores data for millions of applications around the world.
- S3 provides a simple web service interface to store and retrieve any amount of data from anywhere on the web.
- S3 is often used for data backup, disaster recovery, data lakes, hybrid cloud storage, and hosting static websites.
Amazon S3 Glacier:
- It is an archive storage service for data archiving and long-term backup.
- Glacier is designed for infrequent access, providing two main retrieval options: Expedited (for urgent needs) and Standard or Bulk (cost-effective ways for less time-sensitive needs).
Example of S3 usage:
aws s3 cp my-file.txt s3://mybucket/my-file.txt
This command uploads ‘my-file.txt’ to ‘mybucket’ on Amazon S3.
Block Storage: Amazon EBS and Amazon EC2 Instance Store
Amazon Elastic Block Store (EBS):
- Amazon EBS provides persistent block storage volumes for use with Amazon EC2 instances.
- EBS volumes offer the consistent and low-latency performance needed to run your workloads.
- They are ideal for databases, transactional workloads, and any applications requiring fine granular updates and access to raw block-level storage.
EC2 Instance Store:
- Instance store provides temporary block-level storage for an Amazon EC2 instance.
- It is designed for high throughput, low latency, and data that is transient and doesn’t need to persist beyond the life of the instance.
File Storage: Amazon Elastic File System and AWS FSx
Amazon Elastic File System (EFS):
- EFS provides a simple, serverless, set-and-forget, elastic file storage system.
- It is designed to scale on demand to petabytes without disrupting applications, growing and shrinking automatically as you add and remove files.
AWS FSx:
- Offers fully managed third-party file systems with the native compatibility and feature sets for workloads such as Windows-based storage, high-performance computing, machine learning, and electronic design automation.
- AWS FSx has two main offerings:
- AWS FSx for Windows File Server
- AWS FSx for Lustre
Databases and Data Transfer Services
Amazon RDS and Amazon Aurora:
- Managed relational database service options include Amazon RDS and Amazon Aurora.
- They offer various database engine options, including PostgreSQL, MySQL, MariaDB, Oracle, and SQL Server.
AWS Snow Family:
- Consisting of Snowcone, Snowball, and Snowmobile, these are used for transferring large amounts of data into and out of AWS.
- They address challenges such as high network costs, long transfer times, and security concerns.
AWS Storage Gateway:
- A hybrid cloud storage service that gives you on-premises access to virtually unlimited cloud storage.
- It can be used for backup and disaster recovery, data arching, and hybrid cloud burst throughput.
Comparison of Key AWS Storage Services: Storage Classes and Use Cases
Service | Use Case | Durability | Availability | Latency | Throughput |
---|---|---|---|---|---|
Amazon S3 | Big Data, backups, archival, hosting | 99.999999999% | High | Low | High |
Amazon S3 Glacier | Archival, compliance | 99.999999999% | High | High (hours) | Low to Moderate |
Amazon EBS | Databases, transactional workloads | 99.999% | AZ | Low | High |
EC2 Instance Store | Temp files, buffers, caches | N/A (ephemeral) | AZ | Very Low | Very High |
Amazon EFS | Shared file storage, home directories | 99.999999999% | High | Low | Moderate |
AWS FSx | Windows files, HPC, ML, media data | 99.999% (regional) | High | Low | High |
Understanding the differences between these storage options, such as durability, availability, performance characteristics, and cost, is essential for architects planning sophisticated, resilient, and cost-effective solutions on AWS.
Practice Test with Explanation
What type of storage is Amazon EBS?
- a) Object storage
- b) Block storage
- c) File storage
- d) Cold storage
Answer: b) Block storage
Explanation: Amazon Elastic Block Store (EBS) provides block-level storage volumes for use with EC2 instances.
True or False: Amazon S3 allows you to store unlimited data.
- a) True
- b) False
Answer: a) True
Explanation: Amazon S3 is designed to offer 999999999% (11 9s) durability and scale past trillions of objects worldwide, allowing you to store unlimited data.
Which AWS service is best for a managed file storage service that supports network file system (NFS) protocols?
- a) Amazon S3
- b) Amazon EFS
- c) Amazon EBS
- d) AWS Storage Gateway
Answer: b) Amazon EFS
Explanation: Amazon Elastic File System (EFS) is a managed file storage service for EC2 instances and supports NFS protocols.
Amazon Glacier is designed for:
- a) Frequently accessed data
- b) Real-time access to data
- c) Infrequently accessed data with a retrieval time of minutes to hours
- d) Block level storage
Answer: c) Infrequently accessed data with a retrieval time of minutes to hours
Explanation: Amazon Glacier is an archival storage service optimized for data that is infrequently accessed and for which retrieval times of several minutes to hours are acceptable.
AWS Storage Gateway provides which types of interfaces? (Select TWO)
- a) Block-based
- b) Object-based
- c) File-based
- d) Database-based
Answer: a) Block-based, c) File-based
Explanation: AWS Storage Gateway is a hybrid cloud storage service that gives you on-premises access to virtually unlimited cloud storage, supporting block-based (Volume Gateway), file-based (File Gateway), and tape-based (Tape Gateway) storage interfaces.
True or False: AWS Snowball is a service that can be used for edge computing.
- a) True
- b) False
Answer: b) False
Explanation: AWS Snowball is primarily used for data transfer. AWS Snowball Edge includes a small amount of compute power to run AWS Lambda functions and EC2 instances, but it’s primarily a data transfer device rather than a full-fledged edge computing service.
Amazon S3 storage classes include: (Select THREE)
- a) S3 Standard-Infrequent Access
- b) S3 Glacier Deep Archive
- c) S3 One Zone-Infrequent Access
- d) EBS Snapshots
Answer: a) S3 Standard-Infrequent Access, b) S3 Glacier Deep Archive, c) S3 One Zone-Infrequent Access
Explanation: S3 offers multiple storage classes, including S3 Standard for general-purpose storage of frequently accessed data, S3 Intelligent-Tiering for data with unknown or changing access patterns, S3 Standard-IA and S3 One Zone-IA for long-lived, infrequently accessed data, and S3 Glacier and S3 Glacier Deep Archive for archiving.
What does Amazon S3’s storage class S3 Intelligent-Tiering do?
- a) Moves data automatically to the most cost-effective access tier
- b) Encrypts data using AI algorithms
- c) Provides the fastest access to data
- d) Archives data to AWS Snowball
Answer: a) Moves data automatically to the most cost-effective access tier
Explanation: The S3 Intelligent-Tiering storage class automatically moves data to the most cost-effective access tier without performance impact or operational overhead.
True or False: Amazon EFS does NOT support Windows file system protocols such as SMB.
- a) True
- b) False
Answer: a) True
Explanation: By default, Amazon EFS supports NFS protocols and not SMB which is used by Windows systems. However, you can integrate EFS with Windows systems with the help of AWS DataSync or other third-party solutions.
What is the purpose of Amazon S3 Life Cycle policies?
- a) To automate virtual machine backups
- b) To manage the life cycle of objects in your S3 buckets
- c) To schedule operations on EC2 instances
- d) To create IAM policies based on access patterns
Answer: b) To manage the life cycle of objects in your S3 buckets
Explanation: Amazon S3 Lifecycle policies are used to manage and automate the archiving and deletion of objects in S3 buckets based on defined rules.
AWS Snowfamily services are used for:
- a) Archiving data
- b) Cryptographically secure data transfer
- c) Large-scale data transfers in and out of AWS
- d) Blockchain transactions
Answer: c) Large-scale data transfers in and out of AWS
Explanation: AWS Snowfamily, which includes AWS Snowball, Snowball Edge, and Snowmobile, is designed to help you undertake large-scale data transfers into and out of AWS with physical devices, thereby bypassing the internet for data transfer tasks.
True or False: With Amazon S3, you can host a static website with dynamic content processing capabilities.
- a) True
- b) False
Answer: b) False
Explanation: Amazon S3 can be used to host static websites but does not inherently provide dynamic content processing. Dynamic capabilities would need to be implemented through other services such as AWS Lambda@Edge or integrating with Amazon EC2 or other compute resources.
Interview Questions
What are the main differences between Amazon EBS and Amazon EFS?
Amazon EBS (Elastic Block Store) is a block storage service designed for use with EC2 instances for both throughput and transaction-intensive workloads at any scale. Each EBS volume is automatically replicated within its Availability Zone to protect from component failure. In contrast, Amazon EFS (Elastic File System) is a file storage service for use with Amazon EC2 instances and AWS cloud services. EFS provides a file system interface with file system capabilities and is designed to scale on demand without disrupting applications, growing and shrinking automatically as files are added and removed.
How does Amazon S3 differ from Amazon Glacier, and what use cases are they best suited for?
Amazon S3 (Simple Storage Service) is an object storage service with a simple web interface to store and retrieve any amount of data from anywhere on the web. It is best suited for a wide range of use cases such as websites, mobile applications, backup and restore, archive, enterprise applications, IoT devices, and big data analytics. Amazon S3 Glacier is a secure, durable, and extremely low-cost storage service for data archiving and long-term backup. It is intended for data that is infrequently accessed and for which retrieval times of several hours are suitable.
Can you describe the difference between Amazon S3 Standard, S3 Standard-IA, and S3 One Zone-IA storage classes?
Amazon S3 Standard is designed for frequently accessed data and offers high durability, availability, and performance. S3 Standard-IA (IA, for infrequent access) is designed for data that may not be accessed as frequently but requires rapid access when needed, and offers the same durability and throughput as S3 Standard but at a lower cost. S3 One Zone-IA is similar to Standard-IA in terms of cost and performance, but data is stored in a single Availability Zone, which makes it less expensive but does not provide the same level of availability as Standard or Standard-IA.
What kinds of data transfer methods are available for getting large amounts of data into Amazon S3?
AWS provides several data transfer methods for moving large amounts of data to Amazon S3, including AWS Direct Connect for establishing a dedicated network connection, Amazon S3 Transfer Acceleration for fast, secure, and easy transfers over long distances, Amazon S3 Multi-Part upload for higher performance and reliability for large objects, and AWS Snowball for large-scale data transport using physical devices.
What measures can be taken to secure data stored in Amazon S3?
To secure data in Amazon S3, users can employ several measures including the use of Bucket Policies and IAM Policies to control access, configure S3 Block Public Access to prevent public access to data, enable encryption in transit (using SSL/TLS) and at rest (using S3 server-side encryption or client-side encryption), enable versioning to preserve, retrieve, and restore every version of every object stored in an Amazon S3 bucket, and use MFA Delete to add an additional layer of security for object deletion.
What is the significance of S3 object immutability, and how is it achieved?
S3 object immutability is important for ensuring that data cannot be modified or deleted after it has been written. This is particularly significant for compliance with regulatory requirements. It can be achieved by using S3 Object Lock, which allows users to store objects using a “write once, read many” (WORM) model. It can be applied to individual objects or a bucket.
What is Amazon S3 Lifecycle policy, and how can it help in managing storage costs?
Amazon S3 Lifecycle policies are a set of rules that automate the transition of objects between different storage classes and manage the purging of objects that are no longer needed. By implementing lifecycle policies, users can reduce storage costs by automatically moving data to more cost-effective storage classes once it becomes less frequently accessed, or by archiving and deleting objects that have reached the end of their lifecycle.
How does Amazon S3 Intelligent-Tiering work, and what are its benefits?
Amazon S3 Intelligent-Tiering is a storage class designed to optimize costs by automatically moving data to the most cost-effective tier, without performance impact or operational overhead. It is suitable for data with unknown or changing access patterns. The benefits include cost savings, as it automatically moves data to the most cost-efficient access tier without retrieval charges, and no need to analyze and predict access patterns.
Describe an instance when you should consider using Provisioned IOPS with Amazon EBS, and explain why it is necessary.
Provisioned IOPS (PIOPS) with Amazon EBS should be considered when running I/O-intensive workloads that require consistent I/O performance, such as large relational or NoSQL databases, particularly when these applications require more than 16,000 IOPS, which is the baseline performance of EBS General Purpose SSD (gp2) volumes. It is necessary to provide predictable, high throughput for applications that are sensitive to storage performance and consistency.
What AWS service would you use for a hybrid cloud storage solution that allows on-premises applications to seamlessly use cloud storage?
AWS Storage Gateway is the service that provides a hybrid cloud storage solution. It enables on-premises applications to seamlessly use AWS cloud storage through its different modes such as File Gateway, Volume Gateway, and Tape Gateway. Each gateway type provides different features that cater to various use cases such as file shares, block storage, and virtual tape backup, respectively.
What is the difference between a cold HDD (sc1) and a throughput-optimized HDD (st1) EBS volume, and when might you use each?
Cold HDD (sc1) EBS volumes are designed for less frequently accessed workloads with large, cold data sets. They are most cost-effective for applications that require fewer IOPS. Throughput-optimized HDD (st1) EBS volumes, on the other hand, are designed for frequently accessed, throughput-intensive workloads and big data applications. They offer higher throughout compared to sc1 volumes and are suitable for workloads like data warehouses and log processing.
The blog post on storage options was very detailed and informative! Thanks!
Can anyone recommend the best storage option for archival data in AWS?
Interesting insights on S3 and EBS. I’ve always found it tricky to choose between them.
I appreciate the blog post, it helped clarify a lot of doubts!
Is there any performance difference between using GP2 and GP3 EBS volumes?
Very helpful information, thank you!
I think the article misses out on discussing FSx for Lustre.
The breakdown of storage classes for S3 was exceptionally well done. Thanks!