Overview

I.S.K. (Iceberg Service for Kafka) is an Iceberg projection over real-time, event-based data stored in Apache Kafka compatible platforms (including self-hosted clusters).

I.S.K. serves as a Rest Iceberg Catalog that transforms and renders Kafka metadata as Iceberg metadata at runtime. Integration with S.S.K. (Storage Service for Kafka) enables handling of actual message data through Iceberg data access flow.

As a logical projection, I.S.K. enables:

  • First and foremost: analytics-style access to operational data stored in Kafka - no jobs or pipelines to maintain.

  • Flexible data access patterns - partitioning can be swapped on the fly to match specific filters used per each query.

  • Access to always-fresh data - there is no ETL, and therefore no 'lag' when loading data into a data lake.

  • Efficient data scans - predicates in the query are used against partition specs to reduce the amount of data that needs to be fetched from Kafka, and with flexible partitioning this enables really fast and efficient data retrieval.

At present, I.S.K. has some restrictions and limitations:

  • It is a read-only projection - naturally modifying operations (inserts) are not supported.

  • Each Kafka topic requires a schema to be rendered as an Iceberg table.

  • Unbounded queries, queries without filters, e.g "SELECT * FROM my_topic;" still have to scan and fetch whole data sets from the Kafka topic as there is no predicate to use for pruning. Depending on data size - this may slow down queries.

  • Due to how both Iceberg specification and Iceberg clients work, the selected partitioning spec must match the query predicates. The query is resolved on the client side, and so I.S.K. does not know the shape of the data to be served. Therefore the operator must choose a projection and partitioning strategy to match the query.

Last updated