Gone are the days of batch processing and simply loading new tables into the database every 12 hours. With more and more platforms offering event streams of data, the infrastructure that stores these events and draws meaning from it needs to change too.
AWS Kinesis can be built into new or existing architecture to solve this problem. There are several options that allow you to run analytics on the fly, shard the data streams for scalability or simply stream the data into an S3 bucket for later processing.
Stream v Batch
Data is streamed in real-time from a Storage option.
This is a huge change from batch processing that has been the traditional way to land data from one location to another.
Batch processing – data is landed in chunks and analysed when the transfer is complete.
Stream processing – streams of data pour in, in realtime and don’t have an end… unless you create one. This allows us to act on the data and make decisions faster.
AWS Kinesis Data Streams
Back to the concepts, using Kinesis Data Streams as an example:
Input/Producer – the application that generates the events we want to capture. This can be log files, media, website clicks or transactional data.
Data Stream – this is a shard, or group of shards, that ingest records at 1000 records per shard, per second. Data is then available for 24 hours.
Consumer/Processer – this is the AWS service, which can be another Kinesis service, that retrieves the events from the shards. In most cases, this is happening in real-time. It can also push it into a database like DynamoDB or Aurora.
Use cases for Kinesis Data Streams
- Streaming data like website clicks and transactional data
- Migrating data from databases
- Applications with specialised data pipelines
AWS Kinesis Firehose
Kinesis Firehose differs from Kinesis Data Streams as it takes the data, batches, encrypts and compresses it. Then persists it somewhere such as Amazon S3, Amazon Redshift, or Amazon Elasticsearch Service.
Use cases for Kinesis Firehose:
- IoT events
- Splunk can be configured as a destination for security monitoring.
- Auto Archiving
AWS Kinesis Analytics
Kinesis Data Analytics allows us to both process events and analyse them using SQL queries on-the-fly. The service recognises formats like JSON and CSV, then sends the output on to analytics tool for visualisation or action.
Use cases for Kinesis Analytics
- Processing of events data from applications
- Exploratory analysis
- Analysing clickstream anomalies
Is it secure?
- Data is encrypted by default.
- Manage access by using IAM from the console.
How do I pay for all this?
- Shards in Data Streams are billed at an hourly rate.
- Firehose and Analytics services are billed based on the volumes of data ingested.
- The Free Tier does not include Kinesis but many of the other core services are. If you have a use-case for streaming data, give it a try.