Streambased Documentation
  • Home
  • Overview
    • Key Concepts
  • Streambased Cloud
    • Streambased Cloud UI
      • Create your first Streambased cluster
      • Create your first Streambased API Key
      • Running your first A.S.K Query
      • Exploring your data using S.S.K
    • Iceberg Service for Kafka - I.S.K.
      • Overview
      • Architecture
      • Usage
      • Quick Start
    • Analytics Service for Kafka - A.S.K.
      • Overview
      • Architecture
      • Connecting to Streambased A.S.K.
        • Connect Superset to Streambased A.S.K.
        • Connect Jupyter to Streambased A.S.K.
        • Connect a JDBC Client to Streambased A.S.K.
        • Connect an ODBC client to Streambased A.S.K.
        • Connect a Python Application (SQL Alchemy) to Streambased A.S.K.
    • Storage Service for Kafka - S.S.K.
      • Overview
      • Connecting to Streambased S.S.K.
        • Connecting a S3 compatible client to Streambased S.S.K.
        • Connect a S3manager to Streambased S.S.K.
  • Streambased Platform
    • Overview
    • Requirements
    • Step by Step Installation
    • Configuration
      • Dynamic Configuration
    • Connecting Analytical Applications to Streambased
      • Connect Superset to Streambased
      • Connect Jupyter to Streambased
      • Connect a JDBC Client to Streambased
      • Connect an ODBC client to Streambased
      • Connect a Python Application (SQL Alchemy) to Streambased
Powered by GitBook
On this page
  • What is real-time data?
  • Do I have real-time data?
  • What is Apache Kafka?
  • What is Topic / Table Duality?
  • What is Apache Iceberg?
  1. Overview

Key Concepts

PreviousOverviewNextStreambased Cloud

Last updated 7 days ago

What is real-time data?

Real-time data refers to information that is delivered and processed instantaneously, with minimal delay from the moment it is generated to when it is consumed. This type of data is crucial for systems that require immediate insights or actions, such as monitoring systems, financial markets, and IoT devices. Unlike batch processing, where data is collected, stored, and processed at set intervals, real-time data flows continuously and is often time-sensitive.

Do I have real-time data?

Business operations run on real-time. The above shows a typical real-time system for a ride sharing application. Different services exchange data in real-time in order to co-ordinate and achieve business goals.

Whilst this approach is present in most organisations (>80% of fortune 100 companies) it is rarely exposed for analytical purposes. Streambased provides a direct view onto this data and all of the extra insight that comes along with it.

What is Apache Kafka?

Apache Kafka is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications.

Kafka centers around messages packets of data that are transferred between services that create them (producers) and services that receive them (consumers). Messages typically (but not always) consist of a set of fields holding structured information and a grouped together in logical groupings called topics.

What is Topic / Table Duality?

In traditional databases, data is stored in tables, where each row represents an individual record and each column represents a specific attribute of that record. Kafka’s topics can be thought of in a similar way:

Concept
Kafka
Database

A single field within a record

Attribute

Column

A group of fields that represent a single record

Message

Row

A group of records that represent a resource

Topic

Table

A group of related resources

Cluster

Namespace

What is Apache Iceberg?

Apache Iceberg is an open-source table format for large-scale analytics datasets. It was developed by Netflix and later donated to the Apache Software Foundation. Iceberg addresses common challenges in data lake architectures, especially when working with engines like Apache Spark, Trino, Presto, Flink, and Hive.

Streambased surfaces Apache Kafka data as Apache Iceberg tables, allowing you to instantly plug your Kafka data into any analytical engine. See Streambased (Iceberg Service for Kafka) for more detail.

I.S.K.