AWS Kinesis Data Streams vs. Kinesis Data Firehose

by L. Peng · February 27, 2019

The AWS Kinesis Platform offers four services: Kinesis Video Streams (which can capture, process, and store live media data), Kinesis Data Streams (which can capture, process, and store real-time data), Kinesis Data Firehose (which can load real-time data streams into data storage), and Kinesis Data Analytics (which can analyze real-time data with SQL). AWS Kinesis Data Streams may be considered as a cloud-native service of Apache Kafka. In the last post, we compared Apache Kafka and AWS Kinesis Data Streams. These three data set services — Kinesis Data Streams, Kinesis Data Firehose, and Kinesis Data Analytics — can work together to perform the real-time analysis.

Data Flows

One may see different combinations of these three data set services such as:

Input -> Kinesis Data Streams -> Kinesis Data Analytics, Spark on EMR, you consume code on EC2, or Lambda function -> Output
Input -> Kinesis Data Firehose -> AWS S3, AWS Redshift, AWS Elasticsearch Service (ES), or Splunk -> Output
Input -> Kinesis Data Streams or Kinesis Data Firehose -> Kinesis Data Analytics -> Output
Input -> Kinesis Data Streams -> Kinesis Data Analytics -> Kinesis Data Firehose and/or Kinesis Data Streams -> Process or store data -> Output
Input -> Kinesis Data Streams -> Kinesis Data Firehose -> Store data for Analytics ->Output

For example, consider the Streaming Analytics Pipeline architecture on AWS: one can either analyze the stream data through the Kinesis Data Analytics application and then deliver the analyzed data into the configured destinations or trigger the Lambda function through the Kinesis Data Firehose delivery stream to store data into S3.

Another example mentioned in the AWS document is that you can configure Amazon Kinesis Data Streams to send information to a Kinesis Data Firehose delivery stream. After that, you can also use Kinesis Data Analytics or other Kinesis Data Streams which depends on the use case. Concerning data streaming buffer, both Data Streams and Data Firehose can fulfill this task. Thus, in this post, we will compare Kinesis Data Streams (KDS) and Kineses Data Firehose (KDF).

Kinesis Data Set of Services

Let’s take a quick look at the architectures of the three services:

Kinesis Data Streams

A model of the architecture of Kinesis Data Streams is shown below:

The producers put records (data ingestion) into KDS. AWS provides Kinesis Producer Library (KPL) to simplify producer application development and to achieve high write throughput to a Kinesis data stream.
A Kinesis data Stream is a set of shards. Each shard has a sequence of data records. Data records are composed of a sequence number, a partition key, and a data blob (up to 1 MB), which is an immutable sequence of bytes.
The consumers get records from Kinesis Data Streams and process them. You can build your applications using either Kinesis Data Analytics, Kinesis API or Kinesis Client Library (KCL).

Kinesis Data Firehose

A model of the architecture of Kinesis Data Firehose is shown below:

Data producers send records to Kinesis Data Firehose delivery streams.
The underlying entity of Kinesis Data Firehose is Kinesis Data Firehose delivery stream. It automatically delivers the data to the destination that you specified (e.g. S3, Redshift, Elasticsearch Service, or Splunk)
You can also configure Kinesis Data Firehose to transform your data before delivering it. Enable Kinesis Data Firehose data transformation when you create your delivery stream. Then Kinesis Data Firehose invokes your Lambda function to transform incoming source data and deliver the transformed data to destinations.

Kinesis Data Analytics

A model of the architecture of Kinesis Data Analytics is shown below:

Kinesis Data Analytics applications continuously read and process streaming data from Data Streams or Data Firehose in real time. In the input configuration, you map the streaming source to an in-application input stream.
You write application code using SQL to process the incoming streaming data and produce output. You can write SQL statements against in-application streams and reference tables. You can also write JOIN queries to combine data from both of these sources.
Kinesis Data Analytics then writes the output to a configured destination. External destinations can be a Kinesis Data Firehose delivery stream or a Kinesis data stream. You can configure a Kinesis Data Firehose delivery stream to write results to AWS S3, Redshift, or Elasticsearch Service (ES). You can also specify a Kinesis data stream as the destination with AWS Lambda to poll the stream to your custom destination.

Comparison

Concepts	AWS Kinesis Data Firehose	AWS Kinesis Data Streams
Provision	No pre-provision	Configure the number of shards
Scale/Throughput	No limit ~ Automatic	No limit ~ Shards
Data Retention	N/A (Up to 24 hours in case the delivery destination is unavailable. Data can be delivered to AWS S3, Redshift, Elasticsearch Service and Splunk)	1 to 7 days (default is 24 hours)
Delivery	At least once semantics	Same
Multiple Consumers	No (Data can be delivered to AWS S3, Redshift, Elasticsearch Service and Splunk)	Yes
Record/Object Size	1000KiB	1MB
Availability	Three AZs	Same
Security	Data can be secured at-rest by using server-side encryption and AWS KMS master keys on sensitive data within KDS. Access data privately via your Amazon Virtual Private Cloud (VPC)	Same
Monitoring	AWS Cloudwatch, CloudTrail and Kinesis Analytics	Same
Cost	Pay and use.	Pay and use. Setup in a couple Of hours

Conclusion

AWS Kinesis Data Streams enables one to build custom applications that process or analyze streaming data for specialized needs by providing the streaming buffer needed and allowing you to write the custom code with business logic. However, with Kinesis Data Firehose, one doesn’t need to write applications or manage resources. One can configure Kinesis Data Firehose to transform data with the Lambda function. Note that one can take full advantage of the Kinesis data set services by using all three of them or combining any two of them (e.g., configuring Amazon Kinesis Data Streams to send information to a Kinesis Data Firehose delivery stream, transforming data in Kinesis Firehose, or processing the incoming streaming data with SQL on Kinesis Data Analytics).

Comments1
Pingbacks0

alex says:

June 11, 2020 at 8:15 AM

Which flow is possible?
1) Convert data stream by Kinesis Analytics ? (input -> Kinesis Data Streams -> Kinesis Data Analytics-> back to Kinesis Data Streams -> my consumer service -> output )
2)Convert by Kinesis Firehose and send back to Kinesis Data Streams (input -> Kinesis Data Streams -> Kinesis Firehose -> back to Kinesis Data Streams -> my consumer service -> output )
Main point i need to parse and transform data to my format and send it in stream . As i understand the Firehose has to save result to DB , so it’s not satisfied me ….

AWS Kinesis Data Streams vs. Kinesis Data Firehose

You may also like...

1 Response

Leave a Reply Cancel reply

Recent Posts

About Our Blog

Archives

Sign Up

CLOUD COMPUTING CERTS

Training Courses

AWS Kinesis Data Streams vs. Kinesis Data Firehose

Data Flows

Kinesis Data Set of Services

Kinesis Data Streams

Kinesis Data Firehose

Kinesis Data Analytics

Comparison

Conclusion

You may also like...

The Container as a Service: GCP GKE vs. AWS EKS

How to Use AWS CodeStar to Manage Lambda Java Project from Source to Test Locally

AWS Data Warehouse – Build with Redshift and QuickSight

1 Response

Leave a Reply Cancel reply

Recent Posts

About Our Blog

Archives

Sign Up

CLOUD COMPUTING CERTS

Training Courses

Popular Posts