Gently Down the Stream with AWS Kinesis
Gone are the days of batch processing and simply loading new tables into the database every 12 hours. With more and more platforms offering event streams of data, the infrastructure that stores these events and draws meaning from it needs to change too.
This post is an overview of how AWS Kinesis can be built into new or existing architecture to solve this problem. There are several options that allow you to run analytics on the fly, shard the data streams for scalability or simply stream the data into an S3 bucket for later processing.
Stream v Batch
This allows data to be streamed in real time from a Producer to a Processer or Storage option. More on these concepts in a bit.
This is a huge change from Batch Processing that has been the traditional way to land data from one location to another.
Batch Processing – Data, usually stored in a database, is landed in chunks and analysed when the transfer is complete.
Stream Processing – Streams of data pour in, in realtime and don’t have an end… unless you create one. This allows us to act on the data and make decisions faster.
AWS Kinesis Data Streams
Back to the concepts, using Kinesis Data Streams as an example:
Input/Producer: The application that generates the events we want to capture. This can be log files, media, website clicks or transactional data.
Data Stream: This is a shard, or group of shards, that ingest records at 1000 records per shard, per second. Data is then available for 24 hours.
Consumer/Processer: This is the AWS service, which can be another Kinesis service, that retrieves the events from the shards. In most cases, this is happening in real-time. AWS Lambda can be triggered to transform the event data into more usable data or push it into a database like DynamoDB or Aurora.
Use cases for Kinesis Data Streams:
- Streaming data like website clicks and transactional data
- Migrating data from databases
- Applications with specialised data pipelines
AWS Kinesis Firehose
Kinesis Firehose differs from Kinesis Data Streams as it takes the data, batches, encrypts and compresses it. Then persists it somewhere such as Amazon S3, Amazon Redshift, or Amazon Elasticsearch Service.
Use cases for Kinesis Firehose:
- IoT events
- Security monitoring as Splunk can be configured as a destination
- Auto Archiving
AWS Kinesis Analytics
Kinesis Data Analytics allows us to both process events and analyse them using SQL queries on-the-fly. The service recognises formats like JSON and CSV, then sends the output on to analytics tool for visualisation or action.
Use cases for Kinesis Analytics:
- Processing of events data from applications
- Exploratory analysis
- Analysing clickstream anomalies
Is it secure?
Data is automatically encrypted and access can be managed using IAM from the console.
How do I pay for all this?
- If using the Data Streams service, each shard is charged at an hourly rate
- Firehose and Analytics services are billed based on the volumes of data ingested.
This service is not included in the Free Tier but many of the other core services are.
Photo by Tobias Bjørkli from Pexels