Dagster-Iceberg project

This is a project to investigate how to set up a modern data toolstack with Dagster, Apache Iceberg, Azure and DuckDB or Daft.

Get started

Create a .env in fab-data/ like this:

AZURE_CONNECTION_STRING = "<connection string>"
AZURE_BRONZE_CONTAINER_NAME = "<bronze>"
AZURE_SILVER_CONTAINER_NAME = "<silver>"
AZURE_GOLD_CONTAINER_NAME = "<gold>"
AZURE_STORAGE_ACCOUNT_NAME = "<your account>"
AZURE_STORAGE_ACCOUNT_KEY = "<your key>"

Install python modules. TODO.
Start dagster to run setup_silver and setup_gold assets. These jobs create local sqlite Iceberg catalogs and the namespaces/tables in Azure.
Run fake_data.py to create fake semiconductor manufacturing data. These json files will get loaded automatically into an Azure container as raw data if the sensor is activated in Dagster.
Running write_silver_fabdata and write_gold_fabreport loads the data into Iceberg tables and execute some basic aggregations for the gold layer.
With "streamlit run fab_report.py" you can start a simple Streamlit report dashboard that uses the gold layer.

Architecture

graph TD
    subgraph Data Sources
        Batch[Batch Sources]
        Stream[Streaming Sources]
    end
    subgraph Orchestrator
        direction TB
        Dagster[Dagster]
    end
    subgraph Visualization
        direction TB
        Streamlit[Streamlit]
    end
    subgraph Data Lakehouse
        direction LR
        Bronze[**Bronze Layer**<br>Raw data <br> JSON]
        Silver[**Silver Layer**<br>Cleaned, Augmented Data<br>Apache Iceberg]
        Gold[**Gold Layer**<br>Aggregates<br>Apache Iceberg]
    end
    


    Batch --> Bronze
    Stream --> Bronze
    Bronze --> Silver
    Silver --> Gold
    Streamlit --> Gold
    Dagster --> Bronze
    Dagster --> Silver
    Dagster --> Gold
    style Bronze fill:#CE8946,stroke:#333,stroke-width:2px
    style Silver fill:#C0C0C0,stroke:#333,stroke-width:2px
    style Gold fill:#FFD700,stroke:#333,stroke-width:2px
    style Dagster fill:#5eb1ef,stroke:#333,stroke-width:2px

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.devcontainer		.devcontainer
.github		.github
data-platform		data-platform
raw_data		raw_data
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Dagster-Iceberg project

Get started

Architecture

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Languages

banickn/dagster-iceberg

Folders and files

Latest commit

History

Repository files navigation

Dagster-Iceberg project

Get started

Architecture

Contributing

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages