Pipe Kafka & Iceberg to dashboards

In order to follow this page, it's advised that you run the demo first.

Previously, we saw a unified view of Kafka and Iceberg data. We were able to query this data using SQL, and now we're going to see how easy it is to view our data in a data visualisation dashboard — in this case Superset.

Charts have already been created, aggregating and displaying the number of fraudulent transactions historically (in Iceberg) and recently (in Kafka).

Superset

If you head to port 8088 at this address you'll see a constantly refreshing Superset chart updated with the most recent Kafka data.

How complicated is the set up? Assuming that you have the Superset docker image downloaded, it's as simple as these three steps:

Head to 'Database Connections' within the settings (in the top-right-hand corner):

Edit the 'Streambased-merged' database:

Enter hive://hive@spark-iceberg:10000/merged into the SQLALCHEMY URI field. In this demo you'll find it's prepopulated.

What am I looking at?

If you head to the other dashboard you will see two charts side-by-side. Both dashboards compare fraudulent to non-fraudulent transactions. The transactions topic represented in these charts combines both Kafka and Iceberg data.

In doing so they represent the CustomerFlaggedFraud value of the messages for a given period.

+--------------------+--------------------+--------------------+--
|       TransactionID| .................. |CustomerFlaggedFraud|
+--------------------+--------------------+--------------------+
|c4ba5a1b-0828-277   | .................. |                true|

The chart on the right shows historic data — specifically fraud data within one week in 2024 (see the filters on the left-hand side).

Meanwhile, "recent" data here means all messages generated since the 23rd of October 2025 at 12:34, which is the pre-configured "present" time at which the project was run.

As stated above, the data represented in both charts actually sits in different data systems. The recent fraud data is taken from the hotset (Kafka) and the historic data is taken from the coldset (Iceberg). Without Streambased, representing the data from each source would require two different set ups unique to each system.

Not only is it possible to query Kafka and Iceberg data with Streambased, but you can also easily represent your data on dashboards such as Superset, and without copying data.

Last updated 3 months ago

hashtagSuperset

hashtagWhat am I looking at?

Superset

What am I looking at?