This project builds a retail analytics system from the Online Retail II dataset and is designed to answer four business questions:
-
What happened?
Revenue trends, country concentration, product performance, and sales structure -
Why did it happen?
Returns, adjustments, merchandise vs non-merchandise logic, and customer concentration -
What should be monitored next?
Customer segmentation, anomaly detection, and time-series behavior -
How should reporting be built correctly?
Through a validated PostgreSQL data layer connected to Tableau and supported by Power BI / DAX
- Raw data profiling
- Data cleaning
- Cleaning audit logging
- Exploratory data analysis
- Non-merchandise code identification
- PostgreSQL installation and initial schema work
- Raw source workbook
- Staging sales/returns/adjustments files
- Profiling and cleaning report artifacts
- PostgreSQL loader rewrite using staging files only
- Post-load validation against benchmark numbers
- SQL views
- RFM segmentation
- Anomaly detection
- Forecasting
- Tableau dashboard
- Power BI / DAX companion
- Later extension with Rossmann and NOAA data
- Online Retail II
- Two sheets:
Year 2009-2010Year 2010-2011
- MD5:
ed54ccfc5d358481c399cc11d0a244be
- 1,067,371 raw rows
- 34,335 exact duplicates
- 243,007 missing customer IDs
- 4,382 missing descriptions
- 22,950 negative-quantity rows
- 5 negative-price rows
- 1,007,914 valid sales rows
- 20,476,634.02 total revenue
- 11,205,149 total quantity sold
- 40,078 unique invoices
- 4,917 stock codes
- 43 countries
- United Kingdom revenue: 17,410,569.69
- UK share of total revenue: 85.03%
Top revenue months:
- November β 2,968,159.92
- October β 2,313,165.95
- December β 2,281,745.01
Confirmed non-merchandise codes:
Mβ ManualDOTβ DOTCOM POSTAGEPOSTβ POSTAGE
Customer-linked staging subset:
- 779,425 rows
- 5,878 identifiable customers
Top customer:
18102.0β 580,987.04
retail-revenue-intelligence/
βββ data/
β βββ raw/
β β βββ online_retail_ii/
β β βββ online_retail_II.xlsx
β βββ staging/
β βββ sales_main.csv
β βββ returns_cancellations.csv
β βββ accounting_adjustments.csv
β βββ non_merchandise_codes.csv
β
βββ docs/
β βββ ABSTRACT.md
β βββ ABOUT_THE_ANALYST.md
β βββ data_findings.md
β βββ HOW_TO_READ_NOTEBOOKS.md
β βββ TABLEAU_GUIDE.md
β βββ validation_benchmarks.md
β
βββ notebooks/
β βββ 01_data_understanding.ipynb
β βββ 02_data_cleaning.ipynb
β βββ 03_eda.ipynb
β βββ 04_anomaly_detection.ipynb
β βββ 05_forecasting.ipynb
β βββ 06_rfm_segmentation.ipynb
β βββ 07_validation.ipynb
β
βββ outputs/
β βββ reports/
β βββ profiling_summary.json
β βββ cleaning_audit_log.csv
β
βββ powerbi/
βββ sql/
βββ src/
βββ tableau/
βββ tests/
βββ .env.example
βββ .gitignore
βββ README.md
βββ requirements.txt