Skip to content

Mayur97V/sql-analytics-portfolio

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SQL Analytics Portfolio

A collection of advanced SQL queries solving real business problems against a retail & e-commerce dataset. Each query demonstrates a specific analytical technique — from revenue cohort analysis to customer lifetime value modeling — with clear business context, approach, and interpretation.

About This Project

This portfolio demonstrates production-level SQL skills used in finance analytics, operational reporting, and cloud data environments. Every query is written to be readable, well-commented, and representative of the kind of ad-hoc and recurring analysis expected of a Data Analyst at a Fortune 500 company.

Dataset: Brazilian E-Commerce Public Dataset by Olist — 100K+ orders from 2016–2018 across multiple sellers, products, and geographies.

Why this dataset: It mirrors real enterprise data — normalized across multiple tables (orders, items, payments, customers, sellers, products, reviews), includes timestamps for time-series analysis, and has enough volume and messiness to require real analytical thinking.

Skills Demonstrated

Technique Queries
Window Functions (ROW_NUMBER, RANK, LAG, LEAD, NTILE) Q1, Q3, Q5, Q8
Common Table Expressions (CTEs) All queries
Cohort Analysis Q2
Funnel / Conversion Analysis Q6
Year-over-Year & Period Comparisons Q3
Customer Segmentation (RFM) Q5
Revenue & Margin Analysis Q1, Q3, Q7
Self-Joins Q4
Running Totals & Moving Averages Q8
Data Quality Validation Q10
Seller / Vendor Performance Scoring Q9
Customer Lifetime Value (CLV) Q7

Repository Structure

sql-analytics-portfolio/
├── README.md
├── queries/
│   ├── q01_revenue_kpi_dashboard.sql
│   ├── q02_monthly_customer_cohort.sql
│   ├── q03_yoy_revenue_growth.sql
│   ├── q04_repeat_purchase_analysis.sql
│   ├── q05_rfm_customer_segmentation.sql
│   ├── q06_order_funnel_analysis.sql
│   ├── q07_customer_lifetime_value.sql
│   ├── q08_rolling_metrics_trends.sql
│   ├── q09_seller_performance_scorecard.sql
│   └── q10_data_quality_audit.sql
├── data/
│   └── schema_diagram.md
├── results/
│   └── sample_outputs.md
└── docs/
    └── setup_guide.md

Schema Overview

┌──────────────┐     ┌──────────────────┐     ┌──────────────────┐
│  customers   │     │     orders       │     │   order_items    │
│──────────────│     │──────────────────│     │──────────────────│
│ customer_id  │────▶│ order_id         │◀────│ order_id         │
│ city         │     │ customer_id      │     │ product_id       │
│ state        │     │ order_status     │     │ seller_id        │
│ zip_code     │     │ purchase_date    │     │ price            │
└──────────────┘     │ approved_date    │     │ freight_value    │
                     │ delivered_date   │     └──────────────────┘
                     │ estimated_date   │              │
                     └──────────────────┘              │
                            │                          ▼
                     ┌──────────────────┐     ┌──────────────────┐
                     │  order_payments  │     │    products      │
                     │──────────────────│     │──────────────────│
                     │ order_id         │     │ product_id       │
                     │ payment_type     │     │ category_name    │
                     │ payment_value    │     │ weight_g         │
                     └──────────────────┘     │ length_cm        │
                            │                 └──────────────────┘
                     ┌──────────────────┐              │
                     │  order_reviews   │     ┌──────────────────┐
                     │──────────────────│     │    sellers       │
                     │ order_id         │     │──────────────────│
                     │ review_score     │     │ seller_id        │
                     │ review_date      │     │ city             │
                     └──────────────────┘     │ state            │
                                              └──────────────────┘

How to Run

These queries are written in standard SQL compatible with:

  • PostgreSQL (primary target)
  • Snowflake (minor syntax adjustments noted in comments)
  • BigQuery (minor syntax adjustments noted in comments)
  • DuckDB (fully compatible — great for local testing)

See docs/setup_guide.md for step-by-step instructions on loading the dataset locally with DuckDB or PostgreSQL.

Query Index

Q1 — Executive Revenue KPI Dashboard

Business Question: What are the key revenue metrics (total revenue, AOV, orders, revenue per customer) by month, and how do they trend? Techniques: CTEs, aggregation, window functions (LAG for MoM change)

Q2 — Monthly Customer Cohort Retention

Business Question: Of customers acquired in each month, what percentage return to purchase in subsequent months? Techniques: Cohort assignment, self-join, conditional aggregation, retention curve

Q3 — Year-over-Year Revenue Growth by Category

Business Question: Which product categories are growing or declining year-over-year, and by how much? Techniques: CTEs, LAG window function, YoY percentage calculation, ranking

Q4 — Repeat Purchase & Cross-Sell Analysis

Business Question: What percentage of customers make repeat purchases, and which product categories are most commonly bought together? Techniques: Self-join, DISTINCT counting, cross-tabulation

Q5 — RFM Customer Segmentation

Business Question: How can we segment customers into actionable groups based on recency, frequency, and monetary value? Techniques: NTILE window function, CASE-based segmentation, multi-CTE pipeline

Q6 — Order Funnel Conversion Analysis

Business Question: Where are orders dropping off in the fulfillment pipeline (approved → shipped → delivered)? Techniques: Funnel stage calculation, date diffs, conditional aggregation, conversion rates

Q7 — Customer Lifetime Value (CLV) Estimation

Business Question: What is the estimated lifetime value of customers by acquisition cohort and segment? Techniques: Cohort analysis, revenue per customer over time, cumulative sums

Q8 — Rolling Revenue & Moving Average Trends

Business Question: What does the 7-day and 30-day moving average revenue look like, and where are anomalies? Techniques: Window functions (AVG OVER with ROWS BETWEEN), running totals, trend detection

Q9 — Seller Performance Scorecard

Business Question: Which sellers are top performers across revenue, delivery speed, and customer satisfaction? Techniques: Multi-metric scoring, NTILE ranking, composite score calculation

Q10 — Data Quality Audit

Business Question: How clean is this dataset? What are the null rates, orphan records, and logical inconsistencies? Techniques: NULL analysis, referential integrity checks, date logic validation, UNION ALL reporting

Author

Mayur Gudala — Data Analyst | MS Business Analytics (4.0 GPA), Concordia University Wisconsin

About

Advanced SQL queries solving real business problems revenue analysis, cohort retention, RFM segmentation, and more

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors