Key Concepts

What is real-time data?

Real-time data refers to information that is delivered and processed instantaneously, with minimal delay from the moment it is generated to when it is consumed. This type of data is crucial for systems that require immediate insights or actions, such as monitoring systems, financial markets, and IoT devices. Unlike batch processing, where data is collected, stored, and processed at set intervals, real-time data flows continuously and is often time-sensitive.

Do I have real-time data?

Business operations run on real-time. The above shows a typical real-time system for a ride sharing application. Different services exchange data in real-time in order to co-ordinate and achieve business goals.

Whilst this approach is present in most organisations (>80% of fortune 100 companies) it is rarely exposed for analytical purposes. Streambased provides a direct view onto this data and all of the extra insight that comes along with it.

What is Apache Kafka?

Apache Kafka is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications.

Kafka centers around messages packets of data that are transferred between services that create them (producers) and services that receive them (consumers). Messages typically (but not always) consist of a set of fields holding structured information and a grouped together in logical groupings called topics.

What is Topic / Table Duality?

In traditional databases, data is stored in tables, where each row represents an individual record and each column represents a specific attribute of that record. Kafka’s topics can be thought of in a similar way:

Concept

Kafka

Database

A single field within a record

Attribute

Column

A group of fields that represent a single record

Message

Row

A group of records that represent a resource

Topic

Table

A group of related resources

Cluster

Namespace

What is Apache Iceberg?

Apache Iceberg is an open-source table format for large-scale analytics datasets. It was developed by Netflix and later donated to the Apache Software Foundation. Iceberg addresses common challenges in data lake architectures, especially when working with engines like Apache Spark, Trino, Presto, Flink, and Hive.

Streambased surfaces Apache Kafka data as Apache Iceberg tables, allowing you to instantly plug your Kafka data into any analytical engine. See Streambased I.S.K. (Iceberg Service for Kafka) for more detail.

PreviousIndexing Kafka Data

Last updated 1 month ago