Key Concepts
What is real-time data?
Real-time data refers to information that is delivered and processed instantaneously, with minimal delay from the moment it is generated to when it is consumed. This type of data is crucial for systems that require immediate insights or actions, such as monitoring systems, financial markets, and IoT devices. Unlike batch processing, where data is collected, stored, and processed at set intervals, real-time data flows continuously and is often time-sensitive.
Do I have real-time data?
Business operations run on real-time. The above shows a typical real-time system for a ride sharing application. Different services exchange data in real-time in order to co-ordinate and achieve business goals.
Whilst this approach is present in most organisations (>80% of fortune 100 companies) it is rarely exposed for analytical purposes. Streambased provides a direct view onto this data and all of the extra insight that comes along with it.
What is Apache Kafka?
Apache Kafka is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications.
Kafka centers around messages
packets of data that are transferred between services that create them (producers) and services that receive them (consumers). Messages typically (but not always) consist of a set of fields holding structured information and a grouped together in logical groupings called topics
.
Topic / Table Duality
In traditional databases, data is stored in tables, where each row represents an individual record and each column represents a specific attribute of that record. Kafka’s topics can be thought of in a similar way:
Concept | Kafka | Database |
---|---|---|
A single field within a record | Attribute | Column |
A group if fields that represent a single record | Message | Row |
A group of records that represent a series | Topic | Table |
Last updated