Connect Jupyter to Streambased

Jupyter notebooks are used for exploratory data analysis, data cleaning, data visualization, statistical modeling, machine learning, and deep learning. Let's bring real-time data into the mix.

Prerequisites:

  • A running Streambased instance, for instructions see here

  • A running Jupyter deployment that has the following additional packages:

    • jupysql

    • sqlalchemy-trino

Step 1: Create a database engine

In a new notebook execute the following:

from sqlalchemy.engine import create_engine
engine = create_engine("trino://[server host]:[server port]/kafka",
                       connect_args ={"http_scheme":"https", "schema":"streambased"})

By default server port is 8080 and server host is the name of the host on which the docker instance has been launched.

Step 2: Connect to the database

From your notebook run the following to load the SQL extension and the open the previously created engine:

%load_ext sql
%sql engine

Step 3: Run a query

Using the SQL extension we can execute anything we like. Happy querying!

%sql SELECT * FROM demo_transactions

Last updated