I.S.K. Quick Start

Streambased Iceberg Service for Kafka (I.S.K.) provides an Apache Iceberg compatible view over Kafka data. Using I.S.K., Kafka data can be surfaced to industry standard data lakes such as Snowflake and Databricks and be used by downstream processors such as Apache Spark and DataFusion. This demo creates an ultra simple Streambased/Kafka environment and uses a simple Spark client to interact with it.

Step 1: Clone the streambased-demos repository

Streambased publishes a number of public demos. We will use one of these for our quickstart. Begin by cloning the repository:

git clone [email protected]:streambased-io/streambased-demos.git

Step 2: Start the environment

Start the environment by running the below:

./bin/start.sh 6_all_in_one

You will see the following services started:

kafka1, zookeeper and schema-registry - A Kafka based operational environment
shadowtraffic - A data generator
directstream - A Streambased -> Iceberg projection instance
spark-iceberg - A Spark deployment and python notebook for working with Iceberg tables
akhq - An operational tool for Kafka observability

We need to ensure there is a suitably large dataset available to the demo and this can take time to build. Please go to localhost:9090 and ensure that there are two topics available (transactions and payment_terms) with the message counts below:

Step 3: Open the notebook

In this demo our Iceberg client will be Spark-SQL and you can experience this from within a python notebook. To access this go to localhost:8888/notebooks/notebooks/ISK-quick-start.ipynb

Run through the notebook to see it in action.

Step 4: Shutting down

To stop the environment run:

./bin/stop.sh

What's next?

This short demo is only one example of Streambased technology, check out the other demos for more.

PreviousA.S.K. Quick Start NextStreambased Cloud

Last updated 24 days ago