Aggregate Functions and GROUP BY¶
Aggregate functions collapse multiple rows into single summary values. GROUP BY partitions rows into groups before aggregation, while HAVING filters the resulting groups.
Key Facts¶
COUNT(*)counts all rows including NULLs;COUNT(column)counts non-NULL values onlySUM,AVG,MIN,MAXall ignore NULLsGROUP BYtreats all NULLs as one group- HAVING filters groups after aggregation; WHERE filters rows before grouping
- Cannot use aggregate functions in WHERE clause
Patterns¶
Basic Aggregation¶
SELECT COUNT(*) FROM users;
SELECT COUNT(DISTINCT city) FROM users;
SELECT SUM(amount), AVG(amount), MIN(amount), MAX(amount) FROM orders;
GROUP BY¶
SELECT city, COUNT(*) AS user_count
FROM users
GROUP BY city
ORDER BY user_count DESC;
-- Multiple grouping columns
SELECT country_code, district, COUNT(*) AS city_count
FROM city
GROUP BY country_code, district;
HAVING (Filter on Aggregated Results)¶
SELECT genre, COUNT(*) AS cnt
FROM album
GROUP BY genre
HAVING COUNT(*) > 5
ORDER BY cnt DESC;
-- Combine WHERE and HAVING
SELECT continent, AVG(life_expectancy) AS avg_life
FROM country
WHERE population > 1000000 -- filter rows before grouping
GROUP BY continent
HAVING avg_life > 70 -- filter groups after aggregation
ORDER BY avg_life DESC;
COUNT Performance on Large Tables¶
SELECT COUNT(*) FROM large_table requires full table/index scan - MVCC means different transactions see different row counts.
Alternatives for approximate counts:
-- PostgreSQL: approximate from statistics
SELECT reltuples::bigint FROM pg_class WHERE relname = 'large_table';
-- MySQL: approximate from information_schema
SELECT table_rows FROM information_schema.tables
WHERE table_name = 'large_table';
Other strategies: materialized count table updated by triggers, Redis counter incremented on INSERT/DELETE, HyperLogLog for distinct count approximation.
Gotchas¶
AVGignores NULLs -AVGof[10, NULL, 20]is 15, not 10- Columns in SELECT must either appear in GROUP BY or be inside aggregate functions
COUNT(DISTINCT col)is slower thanCOUNT(*)- requires deduplication- MySQL allows non-aggregated columns in SELECT without GROUP BY (sql_mode dependent) - PostgreSQL does not
See Also¶
- select fundamentals - WHERE clause and basic filtering
- window functions - aggregations without collapsing rows
- query optimization explain - optimizing aggregate queries