Key Concepts
Last updated
Last updated
Real-time data refers to information that is delivered and processed instantaneously, with minimal delay from the moment it is generated to when it is consumed. This type of data is crucial for systems that require immediate insights or actions, such as monitoring systems, financial markets, and IoT devices. Unlike batch processing, where data is collected, stored, and processed at set intervals, real-time data flows continuously and is often time-sensitive.
Business operations run on real-time. The above shows a typical real-time system for a ride sharing application. Different services exchange data in real-time in order to co-ordinate and achieve business goals.
Whilst this approach is present in most organisations (>80% of fortune 100 companies) it is rarely exposed for analytical purposes. Streambased provides a direct view onto this data and all of the extra insight that comes along with it.
Apache Kafka is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications.
Kafka centers around messages
packets of data that are transferred between services that create them (producers) and services that receive them (consumers). Messages typically (but not always) consist of a set of fields holding structured information and a grouped together in logical groupings called topics
.
In traditional databases, data is stored in tables, where each row represents an individual record and each column represents a specific attribute of that record. Kafka’s topics can be thought of in a similar way:
Concept | Kafka | Database |
---|---|---|
A single field within a record
Attribute
Column
A group if fields that represent a single record
Message
Row
A group of records that represent a series
Topic
Table