Usage
ISK is a Rest Iceberg catalog with data served over S3 compatible SSK service.
Below is general overview of relevant client configuration - for examples check out Quick Start Guide in docs or GitHub example.
catalog.type
rest
Iceberg catalog implementation type
catolog.uri
[ISK endpoint]
ISK Endpoint to use according to selected region
catalog.rest.sigv4-enabled
true
enables usage of AWS SigV4 authentication
catalog.rest.access-key
[Streambased Access Key]
access key to use for authentication
catalog.rest.secret-key
[Streambased Secret Key]
secret key to use for authentication
catalog.rest.signing-region
us-east1
any valid AWS region value.
catalog.warehouse
s3://
As SSK is virtual S3 compatible service - there is no path to configure beyond just root path.
catalog.io-impl
org.apache.iceberg.aws.s3.S3FileIO
Access to data is through S3 compatible SSK service - clients should use S3 IO.
s3.path-style-access
true
SSK only supports path style access
s3.endpoint
[SSK Endpoint]
SSK endpoint to use according to selected region
s3.region
us-east1
Should match `catalog.rest.signing-region`
Note: for Spark clients - it may have to be specified as AWS_REGION environment variable or in AWS profile.
s3.access-key
[Streambased Access Key]
Note: for Spark clients - it may have to be specified as AWS_ACCESS_KEY_ID environment variable or in AWS profile.
s3.secret-key
[Streambased Secret Key]
Note: for Spark clients - it may have to be specified as AWS_SECRET_ACCESS_KEY environment variable or in AWS profile.
Note: that actual configuration keys will vary depending on client in use.
Client configuration
Specific client configuration reference for Spark and Trino clients.
Spark Configuration
Following Iceberg Catalog and data IO configuration has to be specified in spark-defaults.conf - in general - configuration follows standard Rest Iceberg Catalog configuration with S3 File IO and AWS SigV4 request signing for authentication:
spark.sql.extensions
org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
Enable Iceberg in Spark
spark.sql.catalog.streambased
org.apache.iceberg.spark.SparkCatalog
Specify catalog - in this case catalog name is specified as "streambased"
catalog.streambased.type
rest
Iceberg catalog implementation type
catolog.streambased.uri
[ISK endpoint]
ISK Endpoint to use according to selected region
catalog.streambased.rest.sigv4-enabled
true
enables usage of AWS SigV4 authentication
catalog.streambased.rest.access-key
[Streambased Access Key]
access key to use for authentication
catalog.rest.secret-key.streambased
[Streambased Secret Key]
secret key to use for authentication
catalog.streambased.rest.signing-region
us-east1
any valid AWS region value.
catalog.streambased.io-impl
org.apache.iceberg.aws.s3.S3FileIO
Access to data is through S3 compatible SSK service - clients should use S3 IO.
catalog.streambased.s3.path-style-access
true
SSK only supports path style access
catalog.streambased.warehouse
s3://
As SSK is virtual S3 compatible service - there is no path to configure beyond just root path.
catalog.streambased.s3.endpoint
[SSK Endpoint]
SSK endpoint to use according to selected region
Note: AWS S3 credentials and region have to be provided to spark through appropriate provider (environment, profile etc) and set to Streambased API Key, Streambased Secret and same AWS region value as catalog.streambased.rest.signing-region
Note: when querying the catalog - there is a single isk
namespace that has to be used in all queries.
Trino Configuration
Following Iceberg Catalog and data IO configuration has to be specified in a new properties file streambased.properties
that is places in /etc/trino/catalog/
path - in general - configuration follows standard Rest Iceberg Catalog configuration with S3 File IO and AWS SigV4 request signing for authentication:
connector.name
iceberg
trino connector name
iceberg.catalog.type
rest
Iceberg catalog implementation type
iceberg.file-format
AVRO
Iceberg metadata and data is served as AVRO in ISK / SSK
iceberg.rest-catalog.uri
[ISK endpoint]
ISK Endpoint to use according to selected region
iceberg.rest-catalog.sigv4-enabled
true
enables usage of AWS SigV4 authentication
iceberg.rest-catalog.warehouse
s3://
As SSK is virtual S3 compatible service - there is no path to configure beyond just root path.
fs.hadoop.enabled
false
not using hadoop file system - disable it
fs.native-s3.enabled
true
SSK is S3 compatible service - enable S3 file system use
s3.region
us-east1
any valid AWS region value.
s3.aws-access-key
[Streambased Access Key]
access key to use for authentication
s3.aws-secret-key
[Streambased Secret Key]
secret key to use for authentication
s3.path-style-access
true
SSK only supports path style access
s3.endpoint
[SSK Endpoint]
SSK endpoint to use according to selected region
Note: when querying the catalog - there is a single isk
namespace that has to be used in all queries.
Last updated