What, Why and Where of Kafka
Ever wondered how do we get real-time data for our orders/shipments, payment transaction details? Well, I have something interesting to share with you all!
Let’s start with the What of Kafka.
Kafka is distributed event streaming platform which lets you write, read, store and process events(or records or messages).
In other words,
* It captures real-time data from event sources like databases, sensors, mobile devices, cloud services, and software applications in the form of streams of events,
* gives a provision to store these event streams durably for later retrieval(the events are stored in TOPIC),
* manipulating, processing, and reacting to the event streams in real-time,
* and routing the event streams to different destination(irrespective of the technology) as needed.
A topic is similar to a folder in a filesystem, and the events are the files in that folder.
Event streaming simply refers to the process of moving event data from place to place efficiently so other systems can easily access and analyze it. Kafka is a perfect example of event streaming tool.
Event streaming is the digital equivalent of the human body’s central nervous system.
There is also a distinction between event processing and event stream processing. Event processing looks at individual events one at a time, whereas event stream processing handles many related events together. Event streaming is a part of the event stream processing
Why choose Kafka?
Event streaming ensures a continuous flow and interpretation of data so that the right information is at the right place, at the right time. And all this functionality is provided in a distributed, highly scalable, elastic, fault-tolerant, and secure manner in kafka.
Example of Events in day-to-day life:
Payment transactions, cybersecurity, track&trace(Geo-location updates for mobile), shipping orders, live betting, and much more
Usage and Industries where is Kafka used?
1. To capture and analyze data in IoT devices(Ex: in traffic management)
2. To collect data and react immediately to customer interactions in real time. Industries used: Hotel,Mobile applications/websites, Retail
3. To process payments and financial transactions in real-time. Industries used: Stock Exchanges, Banks, and Insurances.
4. To track and monitor the trucks, cars, shipments etc in real time. Industries used: Automotive and Logistics
5. To monitor patients in hospital care and predict changes in condition.(Eg: Humana uses Kafka to store patient’s data and sync it across)
Kafka can be deployed on bare-metal hardware, virtual machines, and containers, and on-premises as well as in the cloud.
One can self manage the Kafka environments or use the fully managed services offered by a variety of vendors.
Some of the companies which use Kafka?
Netflix:Real-time monitoring and event-processing pipeline.
Spotify: As part of their log delivery system.
LinkedIn : for activity stream data and operational metrics. This powers products like News feed.
Barclays: Streaming and analytical information
airbnb: Used in event pipeline, exception tracking
Uber: For monitoring and tracking,near real-time analytics.
The NewYork Times:to store and distribute, in real-time, published content to the various applications and systems that make it available to the readers.
Strava: analytics pipeline, activity feeds.
Shopify: As part of their log delivery system.
trivago: processing of application logs.