Streambased Documentation
  • Home
  • Overview
    • Key Concepts
  • Streambased Cloud
    • Streambased Cloud UI
      • Create your first Streambased cluster
      • Create your first Streambased API Key
      • Running your first A.S.K Query
      • Exploring your data using S.S.K
    • Iceberg Service for Kafka - I.S.K.
      • Overview
      • Architecture
      • Usage
      • Quick Start
    • Analytics Service for Kafka - A.S.K.
      • Overview
      • Architecture
      • Connecting to Streambased A.S.K.
        • Connect Superset to Streambased A.S.K.
        • Connect Jupyter to Streambased A.S.K.
        • Connect a JDBC Client to Streambased A.S.K.
        • Connect an ODBC client to Streambased A.S.K.
        • Connect a Python Application (SQL Alchemy) to Streambased A.S.K.
    • Storage Service for Kafka - S.S.K.
      • Overview
      • Connecting to Streambased S.S.K.
        • Connecting a S3 compatible client to Streambased S.S.K.
        • Connect a S3manager to Streambased S.S.K.
  • Streambased Platform
    • Overview
    • Requirements
    • Step by Step Installation
    • Configuration
    • Connecting Analytical Applications to Streambased
      • Connect Superset to Streambased
      • Connect Jupyter to Streambased
      • Connect a JDBC Client to Streambased
      • Connect an ODBC client to Streambased
      • Connect a Python Application (SQL Alchemy) to Streambased
Powered by GitBook
On this page
  1. Streambased Cloud
  2. Iceberg Service for Kafka - I.S.K.

Overview

I.S.K.(Iceberg Service for Kafka) is an Iceberg projection over real-time event based data stored in Apache Kafka compatible platform.

I.S.K. serves Rest Iceberg Catalog that transforms and renders Kafka metadata as Iceberg metadata at runtime. Integration with S.S.K. (Storage Service for Kafka) enables handling of actual message data through Iceberg data access flow.

As a logical projection - ISK enables:

  • First and foremost - analytics style access to operational data stored in Kafka - no jobs, pipelines to maintain.

  • Flexible data access patterns - partitioning can be swapped on the fly to match specific filters used per query.

  • Access to always fresh data - there is no ETL - therefore there is no 'lag' of loading data into data lake.

  • Efficient data scans - predicates in the query are used against partition specs to reduce amount of data that needs to be fetched from Kafka - with flexible partitioning - this enables really fast and efficient data retrieval.

At the same time it has some restrictions and limitations:

  • It is a read-only projection - naturally modifying operations (inserts) are not supported.

  • Data in Kafka topic has to have a schema to be able to be rendered as an Iceberg table.

  • Unbounded queries, queries without filters - for example SELECT * FROM my_topic; still have to scan and fetch whole data set from the Kafka topic as there is no predicate to use for pruning. Depending on data size - they may be slow.

  • Selected partitioning spec has to match query predicates - due to the way Iceberg specification and Iceberg clients work - the query is resolved on a client - so ISK doesn't know what shape the data should be served and its up to operator to use correct projection with matching partitioning strategy to the query.

PreviousIceberg Service for Kafka - I.S.K.NextArchitecture

Last updated 1 month ago