A collection of advanced SQL queries solving real business problems against a retail & e-commerce dataset. Each query demonstrates a specific analytical technique — from revenue cohort analysis to customer lifetime value modeling — with clear business context, approach, and interpretation.
This portfolio demonstrates production-level SQL skills used in finance analytics, operational reporting, and cloud data environments. Every query is written to be readable, well-commented, and representative of the kind of ad-hoc and recurring analysis expected of a Data Analyst at a Fortune 500 company.
Dataset: Brazilian E-Commerce Public Dataset by Olist — 100K+ orders from 2016–2018 across multiple sellers, products, and geographies.
Why this dataset: It mirrors real enterprise data — normalized across multiple tables (orders, items, payments, customers, sellers, products, reviews), includes timestamps for time-series analysis, and has enough volume and messiness to require real analytical thinking.
| Technique | Queries |
|---|---|
| Window Functions (ROW_NUMBER, RANK, LAG, LEAD, NTILE) | Q1, Q3, Q5, Q8 |
| Common Table Expressions (CTEs) | All queries |
| Cohort Analysis | Q2 |
| Funnel / Conversion Analysis | Q6 |
| Year-over-Year & Period Comparisons | Q3 |
| Customer Segmentation (RFM) | Q5 |
| Revenue & Margin Analysis | Q1, Q3, Q7 |
| Self-Joins | Q4 |
| Running Totals & Moving Averages | Q8 |
| Data Quality Validation | Q10 |
| Seller / Vendor Performance Scoring | Q9 |
| Customer Lifetime Value (CLV) | Q7 |
sql-analytics-portfolio/
├── README.md
├── queries/
│ ├── q01_revenue_kpi_dashboard.sql
│ ├── q02_monthly_customer_cohort.sql
│ ├── q03_yoy_revenue_growth.sql
│ ├── q04_repeat_purchase_analysis.sql
│ ├── q05_rfm_customer_segmentation.sql
│ ├── q06_order_funnel_analysis.sql
│ ├── q07_customer_lifetime_value.sql
│ ├── q08_rolling_metrics_trends.sql
│ ├── q09_seller_performance_scorecard.sql
│ └── q10_data_quality_audit.sql
├── data/
│ └── schema_diagram.md
├── results/
│ └── sample_outputs.md
└── docs/
└── setup_guide.md
┌──────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ customers │ │ orders │ │ order_items │
│──────────────│ │──────────────────│ │──────────────────│
│ customer_id │────▶│ order_id │◀────│ order_id │
│ city │ │ customer_id │ │ product_id │
│ state │ │ order_status │ │ seller_id │
│ zip_code │ │ purchase_date │ │ price │
└──────────────┘ │ approved_date │ │ freight_value │
│ delivered_date │ └──────────────────┘
│ estimated_date │ │
└──────────────────┘ │
│ ▼
┌──────────────────┐ ┌──────────────────┐
│ order_payments │ │ products │
│──────────────────│ │──────────────────│
│ order_id │ │ product_id │
│ payment_type │ │ category_name │
│ payment_value │ │ weight_g │
└──────────────────┘ │ length_cm │
│ └──────────────────┘
┌──────────────────┐ │
│ order_reviews │ ┌──────────────────┐
│──────────────────│ │ sellers │
│ order_id │ │──────────────────│
│ review_score │ │ seller_id │
│ review_date │ │ city │
└──────────────────┘ │ state │
└──────────────────┘
These queries are written in standard SQL compatible with:
- PostgreSQL (primary target)
- Snowflake (minor syntax adjustments noted in comments)
- BigQuery (minor syntax adjustments noted in comments)
- DuckDB (fully compatible — great for local testing)
See docs/setup_guide.md for step-by-step instructions on loading the dataset locally with DuckDB or PostgreSQL.
Business Question: What are the key revenue metrics (total revenue, AOV, orders, revenue per customer) by month, and how do they trend? Techniques: CTEs, aggregation, window functions (LAG for MoM change)
Business Question: Of customers acquired in each month, what percentage return to purchase in subsequent months? Techniques: Cohort assignment, self-join, conditional aggregation, retention curve
Business Question: Which product categories are growing or declining year-over-year, and by how much? Techniques: CTEs, LAG window function, YoY percentage calculation, ranking
Business Question: What percentage of customers make repeat purchases, and which product categories are most commonly bought together? Techniques: Self-join, DISTINCT counting, cross-tabulation
Business Question: How can we segment customers into actionable groups based on recency, frequency, and monetary value? Techniques: NTILE window function, CASE-based segmentation, multi-CTE pipeline
Business Question: Where are orders dropping off in the fulfillment pipeline (approved → shipped → delivered)? Techniques: Funnel stage calculation, date diffs, conditional aggregation, conversion rates
Business Question: What is the estimated lifetime value of customers by acquisition cohort and segment? Techniques: Cohort analysis, revenue per customer over time, cumulative sums
Business Question: What does the 7-day and 30-day moving average revenue look like, and where are anomalies? Techniques: Window functions (AVG OVER with ROWS BETWEEN), running totals, trend detection
Business Question: Which sellers are top performers across revenue, delivery speed, and customer satisfaction? Techniques: Multi-metric scoring, NTILE ranking, composite score calculation
Business Question: How clean is this dataset? What are the null rates, orphan records, and logical inconsistencies? Techniques: NULL analysis, referential integrity checks, date logic validation, UNION ALL reporting
Mayur Gudala — Data Analyst | MS Business Analytics (4.0 GPA), Concordia University Wisconsin