Usage

ISK is a Rest Iceberg catalog with data served over S3 compatible SSK service.

Below is general overview of relevant client configuration - for examples check out Quick Start Guide in docs or GitHub example.

Key
Value
Description

catalog.type

rest

Iceberg catalog implementation type

catolog.uri

[ISK endpoint]

ISK Endpoint to use according to selected region

catalog.rest.sigv4-enabled

true

enables usage of AWS SigV4 authentication

catalog.rest.access-key

[Streambased Access Key]

access key to use for authentication

catalog.rest.secret-key

[Streambased Secret Key]

secret key to use for authentication

catalog.rest.signing-region

us-east1

any valid AWS region value.

catalog.warehouse

s3://

As SSK is virtual S3 compatible service - there is no path to configure beyond just root path.

catalog.io-impl

org.apache.iceberg.aws.s3.S3FileIO

Access to data is through S3 compatible SSK service - clients should use S3 IO.

s3.path-style-access

true

SSK only supports path style access

s3.endpoint

[SSK Endpoint]

SSK endpoint to use according to selected region

s3.region

us-east1

Should match `catalog.rest.signing-region`

Note: for Spark clients - it may have to be specified as AWS_REGION environment variable or in AWS profile.

s3.access-key

[Streambased Access Key]

Note: for Spark clients - it may have to be specified as AWS_ACCESS_KEY_ID environment variable or in AWS profile.

s3.secret-key

[Streambased Secret Key]

Note: for Spark clients - it may have to be specified as AWS_SECRET_ACCESS_KEY environment variable or in AWS profile.

Note: that actual configuration keys will vary depending on client in use.

Client configuration

Specific client configuration reference for Spark and Trino clients.

Spark Configuration

Following Iceberg Catalog and data IO configuration has to be specified in spark-defaults.conf - in general - configuration follows standard Rest Iceberg Catalog configuration with S3 File IO and AWS SigV4 request signing for authentication:

Key
Value
Description

spark.sql.extensions

org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions

Enable Iceberg in Spark

spark.sql.catalog.streambased

org.apache.iceberg.spark.SparkCatalog

Specify catalog - in this case catalog name is specified as "streambased"

catalog.streambased.type

rest

Iceberg catalog implementation type

catolog.streambased.uri

[ISK endpoint]

ISK Endpoint to use according to selected region

catalog.streambased.rest.sigv4-enabled

true

enables usage of AWS SigV4 authentication

catalog.streambased.rest.access-key

[Streambased Access Key]

access key to use for authentication

catalog.rest.secret-key.streambased

[Streambased Secret Key]

secret key to use for authentication

catalog.streambased.rest.signing-region

us-east1

any valid AWS region value.

catalog.streambased.io-impl

org.apache.iceberg.aws.s3.S3FileIO

Access to data is through S3 compatible SSK service - clients should use S3 IO.

catalog.streambased.s3.path-style-access

true

SSK only supports path style access

catalog.streambased.warehouse

s3://

As SSK is virtual S3 compatible service - there is no path to configure beyond just root path.

catalog.streambased.s3.endpoint

[SSK Endpoint]

SSK endpoint to use according to selected region

Note: AWS S3 credentials and region have to be provided to spark through appropriate provider (environment, profile etc) and set to Streambased API Key, Streambased Secret and same AWS region value as catalog.streambased.rest.signing-region

Note: when querying the catalog - there is a single isknamespace that has to be used in all queries.

Trino Configuration

Following Iceberg Catalog and data IO configuration has to be specified in a new properties file streambased.propertiesthat is places in /etc/trino/catalog/path - in general - configuration follows standard Rest Iceberg Catalog configuration with S3 File IO and AWS SigV4 request signing for authentication:

Key
Value
Description

connector.name

iceberg

trino connector name

iceberg.catalog.type

rest

Iceberg catalog implementation type

iceberg.file-format

AVRO

Iceberg metadata and data is served as AVRO in ISK / SSK

iceberg.rest-catalog.uri

[ISK endpoint]

ISK Endpoint to use according to selected region

iceberg.rest-catalog.sigv4-enabled

true

enables usage of AWS SigV4 authentication

iceberg.rest-catalog.warehouse

s3://

As SSK is virtual S3 compatible service - there is no path to configure beyond just root path.

fs.hadoop.enabled

false

not using hadoop file system - disable it

fs.native-s3.enabled

true

SSK is S3 compatible service - enable S3 file system use

s3.region

us-east1

any valid AWS region value.

s3.aws-access-key

[Streambased Access Key]

access key to use for authentication

s3.aws-secret-key

[Streambased Secret Key]

secret key to use for authentication

s3.path-style-access

true

SSK only supports path style access

s3.endpoint

[SSK Endpoint]

SSK endpoint to use according to selected region

Note: when querying the catalog - there is a single isknamespace that has to be used in all queries.

Last updated