Overview
SSK is an object storage (S3 compatible) interface over Kafka data and metadata. Provides easy access to data stored in Kafka as files split by partitions and offset ranges. It has two-fold purpose:
Provide access to Kafka data from S3 compatible clients - easy way to download batch of data without having to worry about combining records into batches / files - for consumption by batch processing oriented applications.
Serve as data layer for ISK (Iceberg Service for Kafka).
Core concepts
1. Authentication & Authorization
AWS compliant authentication / authorization - requests are signed by S3 compliant client using Streambased's API Key and Secret - SSK applies same rules as AWS S3 to verify the signature and authorize the requests to the SSK service. Further authorization of Topics is carried using Kafka credentials that are configured for the user.
2. Kafka Topics
Topics are represented as S3 buckets.
3. Kafka Messages
Kafka messages are represented as Avro files in the S3 buckets - each file represents a set of messages for specific topic partition and offset range - for example 0-100-200.avro represents messages from offset 100 (inclusive) to offset 200 (exclusive) in partition 0 of a given topic. Messages are encoded using Avro Container File format that allows for batch encoding of messages in a single file. Furthermore - the range of offsets is dynamically parsed from the object key - making it possible to specify arbitrary range of offsets to be combined in the single file.
S3 Operation support
Summary
SSK supports a subset of S3 compatible operations that is sufficient to list and retrieve Kafka messages as files. Following subset of S3 compatible operations are supported:
ListBuckets - lists Kafka topics as buckets
ListObjects (v1 & v2) - lists topic content (partition / offset ranges) as files
GetObject - retrieves content of Kafka messages as a file for specified partition and offset range.
Note-able limitations related to S3 interface
For cloud deployed SSK service - path style access has to be enabled in the S3 compatible client.
GetObject with suffix based Range header - ignores requested range and returns full object.
Example of suffix Range header -
range=bytes=-1000
which means last 1000 bytes of requested object.ListObjects and HEAD on GetObject returns File size as 0.
Last updated