Skip to content

Subqueries and CTEs

Intermediate

Subqueries are queries nested inside other queries. CTEs (Common Table Expressions) provide named, reusable query blocks that improve readability for complex SQL.

Key Facts

  • Scalar subqueries return a single value; table subqueries return multiple rows
  • Correlated subqueries execute once per outer row (can be expensive)
  • EXISTS stops at first match - more efficient than IN for existence checks
  • CTEs are defined with WITH keyword and can chain multiple named blocks
  • PostgreSQL materializes CTEs by default (pre-12); use NOT MATERIALIZED hint for inlining

Patterns

Scalar Subquery

SELECT * FROM users
WHERE id = (SELECT sender_id FROM messages GROUP BY sender_id ORDER BY COUNT(*) DESC LIMIT 1);

-- Scalar subquery in SELECT
SELECT u.name,
  (SELECT COUNT(*) FROM orders o WHERE o.user_id = u.id) AS order_count
FROM users u;

IN Subquery

SELECT * FROM users WHERE id IN (SELECT user_id FROM orders WHERE amount > 1000);

-- NOT IN: find users without orders
SELECT * FROM users WHERE id NOT IN (
  SELECT DISTINCT sender_id FROM messages WHERE sender_id IS NOT NULL
);

EXISTS Subquery

-- Users who have at least one order
SELECT * FROM users u WHERE EXISTS (
  SELECT 1 FROM orders o WHERE o.user_id = u.id AND o.amount > 1000
);

-- Anti-pattern replacement: safer than NOT IN
SELECT * FROM users u WHERE NOT EXISTS (
  SELECT 1 FROM messages m WHERE m.sender_id = u.id
);

Derived Table (Subquery in FROM)

SELECT avg_amount FROM (
  SELECT user_id, AVG(amount) AS avg_amount
  FROM orders GROUP BY user_id
) AS user_avgs
WHERE avg_amount > 500;

CTE (Common Table Expression)

WITH active_users AS (
  SELECT user_id, COUNT(*) AS order_count
  FROM orders
  WHERE created_at > '2024-01-01'
  GROUP BY user_id
),
high_value AS (
  SELECT user_id, SUM(amount) AS total
  FROM orders
  GROUP BY user_id
  HAVING total > 10000
)
SELECT u.*, au.order_count, hv.total
FROM users u
JOIN active_users au ON u.id = au.user_id
JOIN high_value hv ON u.id = hv.user_id;

Recursive CTE

-- Organizational hierarchy
WITH RECURSIVE org_tree AS (
  SELECT id, name, manager_id, 1 AS depth
  FROM employees WHERE manager_id IS NULL
  UNION ALL
  SELECT e.id, e.name, e.manager_id, ot.depth + 1
  FROM employees e
  JOIN org_tree ot ON e.manager_id = ot.id
)
SELECT * FROM org_tree ORDER BY depth, name;

Gotchas

  • NOT IN with NULLs: If subquery returns ANY NULL, NOT IN returns empty result for ALL rows - always add WHERE col IS NOT NULL or use NOT EXISTS instead
  • Correlated subqueries run once per outer row - rewrite as JOIN when possible
  • EXISTS only checks for existence, not values - SELECT 1 inside is convention
  • CTEs in PostgreSQL < 12 are optimization fences (always materialized)
  • MySQL does not support recursive CTEs before version 8.0

See Also