Data Types and NULL Handling¶

★★★★★ Advanced

Choosing correct data types affects storage, performance, and data integrity. NULL handling with three-valued logic is one of SQL's most subtle aspects.

MySQL Data Types¶

Numeric¶

Type	Size	Range
TINYINT	1 byte	-128 to 127 (UNSIGNED: 0-255)
SMALLINT	2 bytes	-32768 to 32767
MEDIUMINT	3 bytes	-8M to 8M
INT	4 bytes	-2B to 2B
BIGINT	8 bytes	-9.2E18 to 9.2E18
DECIMAL(M,D)	variable	exact precision (for money)
FLOAT	4 bytes	~7 digits precision
DOUBLE	8 bytes	~15 digits precision

String¶

Type	Max Size	Notes
CHAR(N)	255	fixed length, padded with spaces
VARCHAR(N)	65,535 bytes	variable length, 1-2 bytes overhead
TEXT	65 KB
MEDIUMTEXT	16 MB
LONGTEXT	4 GB
ENUM	1-2 bytes	one value from predefined list

Date/Time¶

Type	Format	Notes
DATE	YYYY-MM-DD
TIME	HH:MM:SS
DATETIME	YYYY-MM-DD HH:MM:SS	8 bytes, no timezone
TIMESTAMP	same format	4 bytes, UTC conversion, range 1970-2038
YEAR	YYYY	1 byte

PostgreSQL Data Types¶

Type	Notes
SERIAL / BIGSERIAL	auto-incrementing integer (4/8 bytes)
INTEGER, BIGINT, SMALLINT	exact integers
NUMERIC(p,s) / DECIMAL	exact decimal
REAL, DOUBLE PRECISION	floating point
VARCHAR(n), TEXT	no performance difference in PG
CHAR(n)	fixed-length, rarely used
BOOLEAN	true/false/null
DATE, TIME, TIMESTAMP	standard date/time
TIMESTAMPTZ	timestamp with timezone (preferred)
INTERVAL	time duration
JSON / JSONB	native JSON (JSONB is binary, indexable)
UUID	128-bit universally unique identifier
BYTEA	binary data
INET, CIDR	IP addresses

NULL Handling - Three-Valued Logic¶

NULL represents missing/unknown data. It is NOT a value itself.

-- Any comparison with NULL yields UNKNOWN
NULL = NULL    -- UNKNOWN (not TRUE)
NULL != 5      -- UNKNOWN (not TRUE)
NULL > 0       -- UNKNOWN

-- Correct NULL checks
SELECT * FROM users WHERE phone IS NULL;
SELECT * FROM users WHERE phone IS NOT NULL;

-- COALESCE: first non-NULL argument
SELECT name, COALESCE(phone, 'No phone') FROM users;

-- NULLIF: returns NULL if arguments are equal
SELECT NULLIF(score, 0) FROM results;  -- avoid division by zero

Key Facts¶

NULL storage: 1-bit per nullable column in row's NULL bitmap
COUNT(*) counts all rows; COUNT(column) excludes NULLs
SUM, AVG, MIN, MAX all ignore NULLs
GROUP BY treats all NULLs as one group
NULLs in UNIQUE indexes: most RDBMS allow multiple NULLs; SQL Server does not by default
PostgreSQL: no performance difference between VARCHAR(n) and TEXT
Always use DECIMAL/NUMERIC for money - FLOAT/DOUBLE have rounding errors

Common Functions¶

String Functions¶

CONCAT(s1, s2)              -- concatenate (both)
LENGTH(s) / CHAR_LENGTH(s)  -- byte/character length
UPPER(s) / LOWER(s)         -- case conversion
TRIM(s) / LTRIM(s) / RTRIM(s)
SUBSTRING(s, pos, len)      -- extract substring
REPLACE(s, from, to)        -- replace occurrences
LEFT(s, n) / RIGHT(s, n)   -- first/last n characters

Numeric Functions¶

ROUND(n, d)  -- round to d decimal places
CEIL(n)      -- round up
FLOOR(n)     -- round down
ABS(n)       -- absolute value
MOD(n, m)    -- modulo

Date Functions¶

-- PostgreSQL
NOW(), CURRENT_DATE, CURRENT_TIMESTAMP
EXTRACT(YEAR FROM date_col)
AGE(timestamp1, timestamp2)         -- returns interval
date_col + INTERVAL '1 month'
TO_CHAR(date_col, 'YYYY-MM-DD')

-- MySQL
NOW(), CURDATE(), CURTIME()
YEAR(d) / MONTH(d) / DAY(d)
DATE_ADD(d, INTERVAL 1 DAY)
DATEDIFF(d1, d2)                    -- difference in days
TIMESTAMPDIFF(HOUR, d1, d2)
DATE_FORMAT(d, '%Y-%m-%d')
STR_TO_DATE(s, '%d/%m/%Y')

Generated Columns (MySQL 8.0+, PostgreSQL 12+)¶

Computed columns stored or calculated on the fly:

-- MySQL: STORED (on disk) or VIRTUAL (computed at read)
CREATE TABLE users (
    first_name VARCHAR(50),
    last_name VARCHAR(50),
    full_name VARCHAR(101) GENERATED ALWAYS AS (CONCAT(first_name, ' ', last_name)) STORED
);
-- Can index STORED generated columns
CREATE INDEX idx_fullname ON users (full_name);

-- PostgreSQL 12+: only STORED
ALTER TABLE users ADD COLUMN full_name TEXT GENERATED ALWAYS AS (first_name || ' ' || last_name) STORED;

JSON Column Patterns¶

-- PostgreSQL JSONB: binary storage, indexable
CREATE TABLE events (
    id SERIAL PRIMARY KEY,
    payload JSONB NOT NULL
);

-- GIN index for containment queries
CREATE INDEX idx_payload ON events USING GIN (payload);

-- Query with operators
SELECT * FROM events WHERE payload @> '{"type": "click"}';         -- contains
SELECT * FROM events WHERE payload ->> 'user_id' = '42';           -- text extraction
SELECT * FROM events WHERE (payload -> 'meta' ->> 'score')::int > 90;  -- nested + cast

-- MySQL 8.0: JSON with generated column for indexing
ALTER TABLE events ADD COLUMN event_type VARCHAR(50)
    GENERATED ALWAYS AS (JSON_UNQUOTE(JSON_EXTRACT(payload, '$.type'))) STORED;
CREATE INDEX idx_event_type ON events (event_type);

UUID as Primary Key¶

-- PostgreSQL: native UUID type
CREATE TABLE users (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    name TEXT NOT NULL
);
-- Pro: no sequential guessing, safe for distributed systems
-- Con: larger than INT (16 bytes vs 4), random = fragmented B-tree inserts

-- UUIDv7 (time-sorted) available via extensions - preserves insert order
-- Better B-tree locality than UUIDv4

Unexpected Type Coercions¶

-- MySQL silently truncates
INSERT INTO t (tinyint_col) VALUES (999);  -- stores 127 (max) with warning
-- PostgreSQL raises error for out-of-range

-- MySQL string-to-number coercion
SELECT * FROM users WHERE phone = 0;
-- Matches ALL rows where phone is a non-numeric string (coerced to 0)!
-- Always use: WHERE phone = '0'

Gotchas¶

MySQL utf8 is 3-byte only (no emoji) - always use utf8mb4
TIMESTAMP has a 2038 limit; use DATETIME or TIMESTAMPTZ for future-proof dates
FLOAT arithmetic: 0.1 + 0.2 != 0.3 - use DECIMAL for exact comparisons
PostgreSQL SERIAL is deprecated in favor of GENERATED ALWAYS AS IDENTITY
NULL in NOT IN: if subquery returns any NULL, NOT IN returns empty result
MySQL implicit type coercion: WHERE varchar_col = 0 matches non-numeric strings - always quote string comparisons
JSONB in PostgreSQL is ~30% larger than JSON but dramatically faster for queries - always prefer JSONB unless you need exact formatting preservation
UUID primary keys cause B-tree page splits on random inserts - consider UUIDv7 (time-ordered) for better write performance
Generated columns cannot reference other generated columns or use subqueries