Skip to content

banickn/dagster-iceberg

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Dagster-Iceberg project

This is a project to investigate how to set up a modern data toolstack with Dagster, Apache Iceberg, Azure and DuckDB or Daft.

Get started

  • Create a .env in fab-data/ like this:
AZURE_CONNECTION_STRING = "<connection string>"
AZURE_BRONZE_CONTAINER_NAME = "<bronze>"
AZURE_SILVER_CONTAINER_NAME = "<silver>"
AZURE_GOLD_CONTAINER_NAME = "<gold>"
AZURE_STORAGE_ACCOUNT_NAME = "<your account>"
AZURE_STORAGE_ACCOUNT_KEY = "<your key>"
  • Install python modules. TODO.

  • Start dagster to run setup_silver and setup_gold assets. These jobs create local sqlite Iceberg catalogs and the namespaces/tables in Azure.

  • Run fake_data.py to create fake semiconductor manufacturing data. These json files will get loaded automatically into an Azure container as raw data if the sensor is activated in Dagster.

  • Running write_silver_fabdata and write_gold_fabreport loads the data into Iceberg tables and execute some basic aggregations for the gold layer.

  • With "streamlit run fab_report.py" you can start a simple Streamlit report dashboard that uses the gold layer.

Architecture

graph TD
    subgraph Data Sources
        Batch[Batch Sources]
        Stream[Streaming Sources]
    end
    subgraph Orchestrator
        direction TB
        Dagster[Dagster]
    end
    subgraph Visualization
        direction TB
        Streamlit[Streamlit]
    end
    subgraph Data Lakehouse
        direction LR
        Bronze[**Bronze Layer**<br>Raw data <br> JSON]
        Silver[**Silver Layer**<br>Cleaned, Augmented Data<br>Apache Iceberg]
        Gold[**Gold Layer**<br>Aggregates<br>Apache Iceberg]
    end
    


    Batch --> Bronze
    Stream --> Bronze
    Bronze --> Silver
    Silver --> Gold
    Streamlit --> Gold
    Dagster --> Bronze
    Dagster --> Silver
    Dagster --> Gold
    style Bronze fill:#CE8946,stroke:#333,stroke-width:2px
    style Silver fill:#C0C0C0,stroke:#333,stroke-width:2px
    style Gold fill:#FFD700,stroke:#333,stroke-width:2px
    style Dagster fill:#5eb1ef,stroke:#333,stroke-width:2px

Loading

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

About

No description or website provided.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages