Streambased Documentation
  • Home
  • Overview
    • Key Concepts
  • Streambased Cloud
    • Streambased Cloud UI
      • Create your first Streambased cluster
      • Create your first Streambased API Key
      • Running your first A.S.K Query
      • Exploring your data using S.S.K
    • Iceberg Service for Kafka - I.S.K.
      • Overview
      • Architecture
      • Usage
      • Quick Start
    • Analytics Service for Kafka - A.S.K.
      • Overview
      • Architecture
      • Connecting to Streambased A.S.K.
        • Connect Superset to Streambased A.S.K.
        • Connect Jupyter to Streambased A.S.K.
        • Connect a JDBC Client to Streambased A.S.K.
        • Connect an ODBC client to Streambased A.S.K.
        • Connect a Python Application (SQL Alchemy) to Streambased A.S.K.
    • Storage Service for Kafka - S.S.K.
      • Overview
      • Connecting to Streambased S.S.K.
        • Connecting a S3 compatible client to Streambased S.S.K.
        • Connect a S3manager to Streambased S.S.K.
  • Streambased Platform
    • Overview
    • Requirements
    • Step by Step Installation
    • Configuration
    • Connecting Analytical Applications to Streambased
      • Connect Superset to Streambased
      • Connect Jupyter to Streambased
      • Connect a JDBC Client to Streambased
      • Connect an ODBC client to Streambased
      • Connect a Python Application (SQL Alchemy) to Streambased
Powered by GitBook
On this page
  • Core concepts
  • S3 Operation support
  • Summary
  • Note-able limitations related to S3 interface
  1. Streambased Cloud
  2. Storage Service for Kafka - S.S.K.

Overview

PreviousStorage Service for Kafka - S.S.K.NextConnecting to Streambased S.S.K.

Last updated 1 month ago

SSK is an object storage (S3 compatible) interface over Kafka data and metadata. Provides easy access to data stored in Kafka as files split by partitions and offset ranges. It has two-fold purpose:

  • Provide access to Kafka data from S3 compatible clients - easy way to download batch of data without having to worry about combining records into batches / files - for consumption by batch processing oriented applications.

  • Serve as data layer for ISK (Iceberg Service for Kafka).

Core concepts

1. Authentication & Authorization

AWS compliant authentication / authorization - requests are signed by S3 compliant client using Streambased's API Key and Secret - SSK applies same rules as AWS S3 to verify the signature and authorize the requests to the SSK service. Further authorization of Topics is carried using Kafka credentials that are configured for the user.

2. Kafka Topics

Topics are represented as S3 buckets.

3. Kafka Messages

Kafka messages are represented as Avro files in the S3 buckets - each file represents a set of messages for specific topic partition and offset range - for example 0-100-200.avro represents messages from offset 100 (inclusive) to offset 200 (exclusive) in partition 0 of a given topic. Messages are encoded using format that allows for batch encoding of messages in a single file. Furthermore - the range of offsets is dynamically parsed from the object key - making it possible to specify arbitrary range of offsets to be combined in the single file.

S3 Operation support

Summary

SSK supports a subset of S3 compatible operations that is sufficient to list and retrieve Kafka messages as files. Following subset of S3 compatible operations are supported:

  • ListBuckets - lists Kafka topics as buckets

  • ListObjects (v1 & v2) - lists topic content (partition / offset ranges) as files

  • GetObject - retrieves content of Kafka messages as a file for specified partition and offset range.

Note-able limitations related to S3 interface

  • For cloud deployed SSK service - path style access has to be enabled in the S3 compatible client.

  • GetObject with suffix based Range header - ignores requested range and returns full object.

    Example of suffix Range header - range=bytes=-1000 which means last 1000 bytes of requested object.

  • ListObjects and HEAD on GetObject returns File size as 0.

Avro Container File