Getting to grips with AWS services can be quite a challenge. My glossary of terminology was inspired by AWS in Plain English, which makes sure I know my Athena from my Aurora and CloudTrail from my CloudFront.
Virtual private cloud
When you first begin using Amazon Web Services, you may want to add a Billing Alarm. Making sure you don’t run into unexpected charges is important since it’s so easy to forget something is running.
An S3 bucket is where objects are stored, similar to files and folders on your local machine. Each object consists of:
- Key – the name of the object
- Value – the data in the file itself made of bytes
- S3 – Most expensive and reliable option
- S3:IA – For storing non-critical data that CANNOT be easily reproduced and needs to be retrieved quickly
- S3:IA-One Zone – For storing non-critical data that CAN be easily reproduced and needs to be retrieved quickly.
- Glacier – Extremely cheap long-term storage with a 3 – 5 hour retrieval time for ‘cold’ data.
- Deep Glacier – For long-term storage with a 12 hour retrieval time for ‘cold’ data.
Amazon RDS creates a storage volume snapshot of your entire instance. Creating this snapshot results in a brief I/O suspension that can last from a few seconds to a few minutes. Multi-AZ DB instances are not affected by this I/O suspension since the backup is taken on standby.
When you create a DB snapshot, you need to identify which DB instance you are going to back up. Then give your DB snapshot a name so you can restore from it later. You can do this using the AWS Management Console, the AWS CLI, or the RDS API.
Amazon CloudFront is the AWS CDN. It caches information closest to the user to the next user can download a copy faster. CloudFront can distribute content including dynamic, static, and streaming content from services like S3 or your own server.
Kinesis allows data to be streamed in real-time from a producer to a processer or storage option. This is a huge change from batch processing that has been the traditional way to land data from one location to another.
AWS Identify and Access Management allows you to securely control individual and group access to your resources.
Users by default have no access until you assign them a role. Roles define a set of permissions for making AWS service requests. Most often these are used to assign groups of users permissions to perform tasks or access services.
Amazon Route 53 is Amazon’s Domain Name System (DNS) web service. It gives developers a cost-effective way to route end users to Internet applications. It translates domain names into IP addresses that computers use to connect to each other.
AWS named the service Route 53 because all DNS requests are handled through port 53.
EC2 is a service that provides virtual machines in the cloud where you only pay for the capacity you use and choose from ‘families’ of instance types that are good for different use cases.
- General purpose – a balance of compute, memory and networking resources
- Compute optimised – ideal for compute-bound applications that benefit from the high-performance processor
- Memory optimised – fast performance for workloads that process large data sets in memory
- Accelerated optimised – hardware accelerators, or co-processors
- Storage optimised – high, sequential read and write access to very large data sets on local storage
Amazon EMR provides a scalable framework so you can run Spark and Hadoop processes over an S3 data lake.
The run job launches an Amazon EMR cluster and starts running steps based on the specified schedule. Once the job completes the EMR cluster is terminated.
The AWS KMS Service makes it easy to create and control encryption keys. The service leverages Hardware Security Modules (HSM) under the hood which guarantees the security and integrity of the generated keys.
If you want to store objects cost-effectively, configure their lifecycle.
The lifecycle configuration defines the actions that Amazon S3 applies to a group of objects. For instance, you might archive objects in Glacier one year after creating them, or transition them to S3:IA 30 days after creating them.
Amazon SNS allows applications to send time-critical messages to multiple subscribers through a “push” mechanism. This eliminates the need to periodically check or “poll” for updates
Amazon SQS stores messages in a queue. Where it waits for an external service to poll SQS and grab messages from SQS.
By using Amazon SNS and Amazon SQS together, messages can be delivered to applications that require immediate notification of an event. Then persisted in an Amazon SQS queue for other applications to process at a later time.
A virtual private cloud (VPC) is a virtual network dedicated to your AWS account. It is logically isolated from other virtual networks in the AWS Cloud. You can launch your AWS resources, such as Amazon EC2 instances, into your VPC.
You can use a NAT device to enable instances in a private subnet to connect to the internet, but prevent the internet from initiating connections. A NAT device forwards traffic from instances in the private subnet to the internet or other AWS services, then sends the response back.
There are multiple ways to pay for Amazon EC2 instances:
On-demand – pay for capacity by per hour or per second depending on which instances you run
Reserved instances – provide a reservation at 75% off the on-demand price, giving you confidence in your ability to launch instances when you need them
Spot instances – request spare Amazon EC2 computing capacity for up to 90% off the on-demand price
Dedicated hosts – provide EC2 instance capacity on physical servers dedicated for your use
Savings plan – provides the benefits of reserved instances but with more flexibility to change instance type within the same family while taking advantage of savings
Amazon EBS is a persistent storage device that can be attached to a single EC2 instance to be used as a file system for databases and storage.
Amazon EFS is a managed, scalable network file system that can be shared across multiple Amazon EC2 instances.
Amazon RDS makes it easy to provision a managed database instance in the cloud. At the time of writing the following database engines were available.
- Amazon Aurora for MySQL and PostgreSQL
- MS SQL Server
Read replication can be part of your disaster recovery plan. You can promote a read replica if the source database instance fails.
Auto Scaling launches and terminates Amazon EC2 instances automatically according to user-defined policies. You can use Auto Scaling to maintain a fleet of AWS EC2 instances that can adjust to any presented load. You can also use Auto Scaling to bring up multiple instances in a group at one time.
CloudWatch is based on metrics. The metrics represent a set of data points ordered by time and published to CloudWatch. Imagine the metric as a variable to be monitored over time, with the data points representing its value.
Each data point has a timestamp and unit of measurement. When you request statistics, the returned data stream contains namespace, metric name, dimension, and unit information.
Virtual Private Cloud (VPC)
A Virtual Private Cloud (VPC) is a virtual data centre that exists inside an AWS availability zone that is logically isolated. The components of a VPC are Internet Gateways/Virtual Private Gateways, routes tables, network access control lists, subnets, and security groups.
AWS Web Application Firewall (WAF) protects web applications from attacks by filtering traffic based on rules that you create.
Block IP addresses that exceed request limits
This lets you control access by IP address, country, blocking SQL injections, malicious scripts and the length of requests.
Block IP addresses that submit bad requests
This solution allows you to block IP addresses using Lambda, CloudWatch and AWS WAF to block requests after a threshold has been reached.
AWS WAF can be deployed on Amazon CloudFront, protecting resources and content at the Edge locations.
You can use x.509 certificates in AWS Certificate Manager to identify users, computers, applications and other devices internally.
OK, I cheated here, but this is a really interesting post that puts it all together: AWS Explained by Operating a Brewery
Fun fact: a yobibyte is 2^80 or 1,208,925,819,614,629,174,706,176 bytes.
AWS hosts its infrastructure in data centres called Availability Zones (AZs). There are multiple AZs in a Region which means that if there is a problem in one AZ another can pick up the slack. For some services, you can host your application in multiple Regions.