Advanced SQL - Window Functions, Subqueries, Date Functions¶

Advanced SQL patterns beyond basic CRUD - window functions for analytics, correlated subqueries, date manipulation, and common query patterns for reporting.

Key Facts¶

Window functions operate over a "window" of rows without reducing row count (unlike GROUP BY)
ROW_NUMBER() gives unique sequential numbers; RANK() has gaps on ties; DENSE_RANK() has no gaps
HAVING filters on aggregate results after GROUP BY; WHERE filters rows before grouping
Correlated subqueries reference the outer query and run once per row (can be slow)
EXISTS is more efficient than IN for existence checks with large subquery results
LAG/LEAD access previous/next row values without self-joins

Patterns¶

GROUP BY and Aggregations¶

SELECT department, COUNT(*) as count, AVG(salary) as avg_salary
FROM employees
GROUP BY department
HAVING AVG(salary) > 50000
ORDER BY avg_salary DESC
LIMIT 5;

-- Common aggregate functions
COUNT(*), COUNT(DISTINCT col)
SUM(col), AVG(col), MIN(col), MAX(col)
STRING_AGG(col, ', ' ORDER BY col)  -- PostgreSQL
GROUP_CONCAT(col ORDER BY col SEPARATOR ', ')  -- MySQL

Subqueries¶

-- Derived table (subquery in FROM)
SELECT dept, avg_sal
FROM (
    SELECT department AS dept, AVG(salary) AS avg_sal
    FROM employees GROUP BY department
) AS dept_averages
WHERE avg_sal > 60000;

-- Scalar subquery in WHERE
SELECT name FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);

-- Correlated subquery (references outer query, runs once per row)
SELECT e.name, e.salary
FROM employees e
WHERE e.salary > (
    SELECT AVG(salary) FROM employees
    WHERE department = e.department
);

-- EXISTS (efficient existence check)
SELECT c.name FROM customers c
WHERE EXISTS (
    SELECT 1 FROM orders o WHERE o.customer_id = c.id
);

Window Functions¶

-- Syntax
function() OVER (
    PARTITION BY col    -- reset window per group
    ORDER BY col        -- ordering within window
    ROWS/RANGE BETWEEN ... AND CURRENT ROW  -- frame
)

Ranking functions:

ROW_NUMBER() OVER(PARTITION BY dept ORDER BY salary DESC)
-- unique sequential: 1, 2, 3, 4

RANK() OVER(ORDER BY score DESC)
-- gaps on ties: 1, 2, 2, 4

DENSE_RANK() OVER(ORDER BY score DESC)
-- no gaps: 1, 2, 2, 3

NTILE(4) OVER(ORDER BY salary)
-- divide into N equal buckets (quartiles)

Offset functions:

LAG(salary, 1, 0) OVER(ORDER BY hire_date)   -- previous row
LEAD(salary, 1, 0) OVER(ORDER BY hire_date)  -- next row

FIRST_VALUE(salary) OVER(PARTITION BY dept ORDER BY hire_date)

LAST_VALUE(salary) OVER(
    PARTITION BY dept ORDER BY hire_date
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
)
-- LAST_VALUE requires explicit UNBOUNDED frame!

Distribution functions:

CUME_DIST() OVER(ORDER BY score)
-- fraction of rows <= current row (0 to 1)

PERCENT_RANK() OVER(ORDER BY score)
-- relative rank: (rank - 1) / (total_rows - 1)

Running totals and moving averages:

SUM(amount) OVER(ORDER BY date)
-- running total (default: UNBOUNDED PRECEDING to CURRENT ROW)

AVG(amount) OVER(ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)
-- 7-day moving average

SUM(amount) OVER(PARTITION BY dept ORDER BY date)
-- running total per department

Common Query Patterns¶

-- Top N per group
SELECT * FROM (
    SELECT *, ROW_NUMBER() OVER(PARTITION BY dept ORDER BY salary DESC) AS rn
    FROM employees
) t WHERE rn <= 3;

-- Running total with percentage of total
SELECT date, amount,
    SUM(amount) OVER(ORDER BY date) AS running_total,
    amount / SUM(amount) OVER() * 100 AS pct_of_total
FROM sales;

-- Year-over-year comparison
SELECT year, revenue,
    LAG(revenue, 1) OVER(ORDER BY year) AS prev_year,
    (revenue - LAG(revenue, 1) OVER(ORDER BY year))
      / LAG(revenue, 1) OVER(ORDER BY year) * 100 AS yoy_pct
FROM annual_revenue;

Date and Time Functions¶

-- PostgreSQL
EXTRACT(YEAR FROM date_col)
EXTRACT(DOW FROM date_col)  -- day of week (0=Sunday)
DATE_TRUNC('month', timestamp_col)
date_col + INTERVAL '7 days'
AGE(timestamp1, timestamp2)

-- MySQL
YEAR(date_col), MONTH(date_col), DAY(date_col)
DATE_FORMAT(date_col, '%Y-%m')
DATEDIFF(date1, date2)
DATE_ADD(date_col, INTERVAL 7 DAY)

Gotchas¶

LAST_VALUE default frame ends at CURRENT ROW, not end of partition - must specify UNBOUNDED FOLLOWING
HAVING only works with GROUP BY; it filters aggregated results, not individual rows
Correlated subqueries execute once per outer row - can be O(N^2); consider JOINs or window functions instead
NULL in COUNT(col) is not counted; COUNT(*) counts all rows including nulls
SUM(amount) OVER() (empty OVER clause) = total across entire result set
Window functions cannot be used in WHERE clause - use a subquery or CTE