What is Streambased?

Streambased is a unified event streaming data platform built for applications, data lakes, and AI systems. It empowers teams to quickly discover, trust, and activate streaming data with confidence and speed.

A unified platform for operational and analytical data

Streambased provides the full scope of your data to all end users. Seamlessly providing real-time data to analytical and AI applications.

It provides:

I.S.K. (Iceberg Service for Kafka) - An Apache Iceberg view of real-time data that integrates with data loakes and other downstream processors.
A.S.K. (Analytics Service for Kafka) - A fully distributed SQL engine that integrates with any analytics application that supports JDBC, ODBC or SQLAlchemy.
S.S.K. (Storage Service for Kafka) - An Amazon S3 compatible proxy that allows users to access real-time data as if it was a filesystem
Streambased MCP server - An implementation of Anthropic's Model Context Protocol standard to allow AI agents to access real-time data.

What sets Streambased apart is:

No data movement - Streambased provides logical view on top of the data and does not move or store any data ahead of query time
Acceleration - To make views perform Streambased uses indexing techniques to massively reduce the amount of data that is required to satisfy analytical queries.

What this means you get is:

A single source of truth - Both operational and analytical applications access the same data meaning there is no opportunity for drift or lag
No ETL - No data transfer ahead of query time means no pipelines to manage and evolve
A single point of governance - Manage permissions, lineage, schema evolution etc. in one system and have it apply to all downstream users.

Where should you use Streambased?

Streambased is especially powerful for the following cases:

Real-time investigation and debugging - Use industry standard tools (Jupyter, Tableau, Superset etc.) and techniques (SQL) to investigate live issues with zero latency data.
Ad-hoc reporting - Don't miss the hidden data point. Include the full breadth of your organisation's data in your reporting tasks.
Data Lake freshness - Remove ETL associated latency constraint to have the freshest data available in your data lake.
Vibe coding for data - Use AI agents to reduce grunt work for data scientists and analysts and directly connect business users with data.

Comparing Streambased to other data systems:

SQL based stream processors

Using SQL to access real-time data is not a new concept. Processors such as KsqlDB and FlinkSQL allow you to specify your stream processing required using SQL like languages and may at first appear similar to Streambased. However, they are different in key areas:

KsqlDB etc. are optimised for continuous queries whereas Streambased is optimised for ad-hoc batch queries. An ad-hoc query executed in Streambased will typically run 30x-100x faster than it's KsqlDB counterpart.
Stream processor SQL language is more complex than regular SQL with users having to understand streaming concepts such as windowing and grace periods. Streambased runs regular ANSI SQL including the full set of operations (joins, aggregates etc.). With Streambased you execute the same statements you would in any other database.

Streambased also greatly simplifies the infrastructure needed to provide real-time results. Stream processing frameworks often require extra real-time streams and intermediate stores in order to achieve their goals. Streambased does not require these.

Analytical Databases

Analytical databases are designed for large volume scans over high latency data, the likes of Databricks, Snowflake etc. are typically fed by ETL pipelines and, because of this, can lag behind the real-time view of the data.

Streambased focuses on providing the freshest view of the data available to your organisation. To accomplish this, it provides a view over the system where data is created (usually a row based system like Apache Kafka) rather than a separate more analytyically focused, column based store.

For this reason, some large volume queries will perform better in analytical databases (there are a lot of variables to this) whereas point lookups and queries that require up to the minute information will perform better in Streambased.

Recently, the Apache Iceberg table format employed by Streambased I.S.K. means that the low latency view from Streambased can easily be combined with longer term analytical storage (such as Parquet files) to provide the best of both worlds.

How does Streambased simplify your architecture?

Streambased simplifies your architecture in the following ways:

Reducing the number of data systems - Streambased provides analytical capabilities that would traditionally be handled by a separate analytics system from the operational system.
Removing data pipelines - The Streambased view approach negates the need for complex data pipeline creation, maintenance and evolution
Removing governance complexities - Streambased applies the same governance controls employed in the operational realm to the analytical realm. Requiring one set of governance policies and processes that applies to everyone.

PreviousHome NextQuick Start

Last updated 26 days ago