Retail Orders Analytics Project

Description

This project builds an end-to-end data pipeline using Python and Pandas to process a retail orders dataset. The cleaned data is loaded into SQL Server, where SQL queries are used to analyze top-performing products, regional sales, monthly trends, and year-over-year growth.

Dataset

The dataset is sourced from Kaggle and contains retail order records including product details, pricing, discounts, order dates, regions, and categories.

Python ETL Process

The data is downloaded and extracted using the Kaggle API. It is loaded into a Pandas DataFrame for preprocessing. The following transformations are applied:

Missing values are handled.
New columns are derived: discount, sale price, and profit.
The order_date column is converted to datetime format.
Unnecessary columns such as list_price, cost_price, and discount_percent are removed.

After preprocessing, the data is loaded into SQL Server using SQLAlchemy with an ODBC connection.

SQL Server Connection

The connection to SQL Server is established using SQLAlchemy and PyODBC, using ODBC Driver 17 for SQL Server. After a successful connection test, the processed data is written to a table named df_orders. To allow remote connections to the SQL Server instance SQLEXPRESS, the TCP/IP protocol was first enabled via SQL Server Configuration Manager under 'Protocols for SQLEXPRESS'.

After enabling TCP/IP, the SQL Server (SQLEXPRESS) service was restarted from the Services panel to apply the configuration changes.

An ODBC data source named myserver was configured using the server name localhost\SQLEXPRESS

In Microsoft SQL Server Management Studio (SSMS), the server name localhost\SQLEXPRESS was entered to successfully connect to the SQL Server instance.

SQL Analysis Overview

Once data is available in the SQL Server table df_orders, several SQL queries are used to derive insights:

Top 10 revenue-generating products
Top 5 highest selling products per region
Month-over-month sales comparison for 2022 and 2023
Highest sales month for each product category
Sub-category with the highest sales growth from 2022 to 2023

Tools Used

Python
Pandas
SQLAlchemy
PyODBC
SQL Server
Kaggle API

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
SQLQuery1.sql		SQLQuery1.sql
orders data analysis.ipynb		orders data analysis.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Retail Orders Analytics Project

Description

Dataset

Python ETL Process

SQL Server Connection

SQL Analysis Overview

Tools Used

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Retail Orders Analytics Project

Description

Dataset

Python ETL Process

SQL Server Connection

SQL Analysis Overview

Tools Used

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages