Streambased Documentation
  • Home
  • Overview
    • Key Concepts
  • Streambased Cloud
    • Streambased Cloud UI
      • Create your first Streambased cluster
      • Create your first Streambased API Key
      • Running your first A.S.K Query
      • Exploring your data using S.S.K
    • Iceberg Service for Kafka - I.S.K.
      • Overview
      • Architecture
      • Usage
      • Quick Start
    • Analytics Service for Kafka - A.S.K.
      • Overview
      • Architecture
      • Connecting to Streambased A.S.K.
        • Connect Superset to Streambased A.S.K.
        • Connect Jupyter to Streambased A.S.K.
        • Connect a JDBC Client to Streambased A.S.K.
        • Connect an ODBC client to Streambased A.S.K.
        • Connect a Python Application (SQL Alchemy) to Streambased A.S.K.
    • Storage Service for Kafka - S.S.K.
      • Overview
      • Connecting to Streambased S.S.K.
        • Connecting a S3 compatible client to Streambased S.S.K.
        • Connect a S3manager to Streambased S.S.K.
  • Streambased Platform
    • Overview
    • Requirements
    • Step by Step Installation
    • Configuration
    • Connecting Analytical Applications to Streambased
      • Connect Superset to Streambased
      • Connect Jupyter to Streambased
      • Connect a JDBC Client to Streambased
      • Connect an ODBC client to Streambased
      • Connect a Python Application (SQL Alchemy) to Streambased
Powered by GitBook
On this page
  • What is real-time data?
  • Do I have real-time data?
  • What is Apache Kafka?
  • Topic / Table Duality
  1. Overview

Key Concepts

PreviousOverviewNextStreambased Cloud

Last updated 8 months ago

What is real-time data?

Real-time data refers to information that is delivered and processed instantaneously, with minimal delay from the moment it is generated to when it is consumed. This type of data is crucial for systems that require immediate insights or actions, such as monitoring systems, financial markets, and IoT devices. Unlike batch processing, where data is collected, stored, and processed at set intervals, real-time data flows continuously and is often time-sensitive.

Do I have real-time data?

Business operations run on real-time. The above shows a typical real-time system for a ride sharing application. Different services exchange data in real-time in order to co-ordinate and achieve business goals.

Whilst this approach is present in most organisations (>80% of fortune 100 companies) it is rarely exposed for analytical purposes. Streambased provides a direct view onto this data and all of the extra insight that comes along with it.

What is Apache Kafka?

Apache Kafka is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications.

Kafka centers around messages packets of data that are transferred between services that create them (producers) and services that receive them (consumers). Messages typically (but not always) consist of a set of fields holding structured information and a grouped together in logical groupings called topics.

Topic / Table Duality

In traditional databases, data is stored in tables, where each row represents an individual record and each column represents a specific attribute of that record. Kafka’s topics can be thought of in a similar way:

Concept
Kafka
Database

A single field within a record

Attribute

Column

A group if fields that represent a single record

Message

Row

A group of records that represent a series

Topic

Table