Exercise: why so much fraud?
In order to follow this page, it's advised that you run the demo first.
Having surveyed Streambased's products, let's replicate a little workflow at the end of this demo. Let's look at the Superset fraud dashboard again:

Note how high the number of recent fraudulent transactions are (tallied since you started running the project). What's going on here?
In order to find out, let's query the data viewed as an Iceberg table. Head to this demo's Jupyter notebook.
Remembering that this is the schema for our Iceberg Table:
Where might the issue lie? Let's see if certain accounts are generating excessive fraudulent transactions. We can also investigate whether any bank branches are responsible for the anomaly.
Accounts
In order to check if fraudulent transactions are clustered in specific accounts, we can aggregate the transactions per account:
You will see in the output that there are no accounts with excessive fraudulent messages:
So let's move to investigating the branches.
Branches
In order to investigate whether these transactions are localised to a specific branch we can aggregate per branch:
We will see an output similar to this:
There we find the culprit! Specifically the Zulauf, Schmidt and McCullough branch.
We can now escalate the matter, update our ML models and or rules engines now that the cause has been identified.
Through this exercise we saw how we can navigate our data forgetting what lives in Iceberg and what lives in Kafka. This allows you to spend more time on the data and less time on the data architecture.
Last updated

