Overview

I.S.K. (Iceberg Service for Kafka) is an Iceberg projection over real-time, event-based data stored in Apache Kafka compatible platforms (including self-hosted clusters).

I.S.K. serves as a Rest Iceberg Catalog that transforms and renders Kafka metadata as Iceberg metadata at runtime. Additionally I.S.K. exposes a S3 compatible storage endpoint for serving Kafka data to Iceberg applications.

As a logical projection, I.S.K. enables:

First and foremost: analytics-style access to operational data stored in Kafka - no jobs or pipelines to maintain.
Flexible data access patterns - partitioning can be swapped on the fly to match specific filters used per each query.
Access to always-fresh data - there is no ETL, and therefore no 'lag' when loading data into a data lake.
Efficient data scans - predicates in the query are used against partition specs to reduce the amount of data that needs to be fetched from Kafka, and with flexible partitioning this enables really fast and efficient data retrieval.

At present, I.S.K. has some restrictions and limitations:

It is a read-only projection - naturally modifying operations (inserts) are not supported.
Each Kafka topic requires a schema to be rendered as an Iceberg table.
Unbounded queries, queries without filters, e.g "SELECT * FROM my_topic;" still have to scan and fetch whole data sets from the Kafka topic as there is no predicate to use for pruning. Depending on data size - this may slow down queries.

PreviousIceberg service for Kafka - I.S.K.NextRequirements

Last updated 1 month ago