CWE-89: SQL Injection¶

★★★★★ Intermediate

CWE Top 25 Rank: 3 (2024). OWASP A03:2021. Impact range: authentication bypass, full DB dump, RCE via xp_cmdshell/COPY TO PROGRAM/UDF.

Functional Semantics¶

SQL is a data-as-code language: the parser cannot distinguish intended query structure from injected structure. The vulnerability exists at the boundary where a string containing SQL is constructed by concatenating or interpolating untrusted data. The database parser evaluates the combined string as a single query; injected tokens alter the parse tree, not just the data values.

Canonical parse tree manipulation:

-- Intended: SELECT * FROM users WHERE name = '<input>'
-- Input: ' OR '1'='1
-- Result:  SELECT * FROM users WHERE name = '' OR '1'='1'
--                                               ^^^^^^^^^^ injected predicate, always true

The critical distinction: parameterized queries pass the value after parsing, so the parse tree is fixed and injected SQL tokens are treated as literal string data.

Non-Obvious Injection Vectors¶

1. ORM "escape hatch" raw queries¶

# VULNERABLE: Django ORM raw() with f-string
def search_users(request):
    username = request.GET['q']
    users = User.objects.raw(f"SELECT * FROM auth_user WHERE username = '{username}'")
    # Equivalent to classic injection; ORM provides zero protection here

# FIXED: parameterized raw query
def search_users(request):
    username = request.GET['q']
    users = User.objects.raw(
        "SELECT * FROM auth_user WHERE username = %s",
        [username]
    )

ORM raw query functions that bypass parameterization (all require manual scrutiny): - Django: Model.objects.raw(), connection.cursor().execute() with interpolation, extra(where=), RawSQL() - SQLAlchemy: text() with .format() or f-strings, engine.execute(string % values) - ActiveRecord: where("name = '#{params[:name]}'") - Hibernate: session.createNativeQuery(hql_string) with string concat - GORM: db.Raw("... " + userInput)

2. Dynamic table and column names¶

Parameterization only applies to values, not identifiers (table names, column names, ORDER BY fields). Identifiers cannot be parameterized in standard SQL.

# VULNERABLE: dynamic ORDER BY column
def get_sorted_data(request):
    col = request.GET.get('sort', 'name')
    query = f"SELECT * FROM products ORDER BY {col}"  # injection via sort=1; DROP TABLE products--
    return db.execute(query)

# FIXED: allowlist identifiers
ALLOWED_SORT_COLUMNS = {'name', 'price', 'created_at'}

def get_sorted_data(request):
    col = request.GET.get('sort', 'name')
    if col not in ALLOWED_SORT_COLUMNS:
        col = 'name'
    query = f"SELECT * FROM products ORDER BY {col}"  # col is now validated
    return db.execute(query)

Note: backtick/double-quote quoting of identifiers does NOT prevent injection - an attacker can close the quote. Allowlist is the only safe approach for identifiers.

3. LIKE pattern injection¶

// VULNERABLE: Node.js with pg - LIKE wildcard injection
async function searchProducts(req, res) {
    const term = req.query.term;
    // Attacker sends: term = %  → returns all rows
    // Attacker sends: term = \%anything\%  → escaping manipulation
    const result = await pool.query(
        `SELECT * FROM products WHERE name LIKE '%${term}%'`
    );
}

Even with parameterization, LIKE metacharacters (%, _) in the value cause DoS (full table scan) or unexpected results. Two separate concerns:

// FIXED: parameterize the whole pattern, escape LIKE metacharacters in value
async function searchProducts(req, res) {
    const term = req.query.term
        .replace(/\\/g, '\\\\')
        .replace(/%/g, '\\%')
        .replace(/_/g, '\\_');
    const result = await pool.query(
        'SELECT * FROM products WHERE name LIKE $1 ESCAPE \'\\\'',
        [`%${term}%`]
    );
}

4. PostgreSQL JSON/JSONB path injection¶

# VULNERABLE: jsonb path in raw query
def filter_by_attr(conn, attr_name, value):
    query = f"SELECT * FROM items WHERE data->>'{attr_name}' = %s"
    # attr_name injection: ' OR 1=1 --  breaks out of jsonb operator context
    conn.execute(query, (value,))

# FIXED: allowlist attr_name
ALLOWED_ATTRS = {'color', 'size', 'brand'}

def filter_by_attr(conn, attr_name, value):
    if attr_name not in ALLOWED_ATTRS:
        raise ValueError("Invalid attribute")
    query = f"SELECT * FROM items WHERE data->>{attr_name!r} = %s"
    conn.execute(query, (value,))

5. GraphQL resolvers building raw queries¶

// VULNERABLE: GraphQL resolver constructing SQL
const resolvers = {
    Query: {
        user: async (_, { filter }) => {
            // filter = { field: "name", value: "alice' OR '1'='1" }
            const sql = `SELECT * FROM users WHERE ${filter.field} = '${filter.value}'`;
            return db.query(sql);
        }
    }
};

// FIXED: parameterize and validate field name
const ALLOWED_FIELDS = new Set(['name', 'email', 'role']);

const resolvers = {
    Query: {
        user: async (_, { filter }) => {
            if (!ALLOWED_FIELDS.has(filter.field)) throw new Error('Invalid field');
            const sql = `SELECT * FROM users WHERE ${filter.field} = $1`;
            return db.query(sql, [filter.value]);
        }
    }
};

6. Second-order injection¶

Data stored safely (correctly parameterized) but later retrieved and embedded unescaped into a new query.

-- Registration: username 'admin'--  stored safely in DB
INSERT INTO users (username) VALUES ($1)  -- parameterized, safe

-- Password change: username retrieved from session, embedded into new query
SELECT * FROM users WHERE username = 'admin'--' AND password = 'x'
-- The -- comments out the password check

Detection: trace all paths where DB-resident data is read and then used in query construction. The parameterization at write time does not protect the read-then-use path.

Trigger Conditions¶

Vector	Trigger
HTTP query param/body	Direct user input into query
HTTP header (User-Agent, Referer)	Logged to DB via raw INSERT
Cookie/session value	Retrieved and used in query
JSON body field	Extracted field used in dynamic SQL
File upload (filename)	Filename stored/queried without parameterization
External API response	Stored then used in second-order injection
GraphQL `variables`	Any field passed to dynamic SQL

Detection Heuristics¶

Trace: HTTP input → query construction

Identify all HTTP input sources: request.GET, request.POST, request.headers, cookies, URL path segments, JSON body fields.
Follow each value through the call stack.
Flag any point where the value (or a derivative) is concatenated/interpolated into a string that is passed to a DB execution function.
Specifically check: cursor.execute(string), raw(), text(), f-strings/.format() calls within DB layer, + operator applied to strings in functions named query, search, filter, find, fetch.

High-signal code patterns:

# Python - flag any of these patterns
f"SELECT ... {user_var}"
"SELECT ... " + user_var
"SELECT ... %s" % user_var   # % formatting - NOT parameterized (different from %s as placeholder)
cursor.execute("SELECT ... '" + val + "'")

// JavaScript - flag these
`SELECT ... ${userVal}`
"SELECT ... " + userVal
db.query("SELECT ... '" + val + "'")

// Java - flag these
"SELECT * FROM t WHERE x = '" + input + "'"
stmt.execute(String.format("SELECT ... %s", input))

ORM-specific escape hatches to search:

# grep patterns for common ORMs
grep -rn "\.raw(" --include="*.py"
grep -rn "RawSQL(" --include="*.py"
grep -rn "extra(where" --include="*.py"
grep -rn "createNativeQuery" --include="*.java"
grep -rn "\.where(\".*#\{" --include="*.rb"    # Ruby string interpolation in ActiveRecord where
grep -rn "db\.Raw(" --include="*.go"

Affected Ecosystems¶

Ecosystem	Parameterization API	Common escape hatch
Python/Django	`%s` placeholders in `execute()`	`raw()`, `extra()`, `RawSQL()`
Python/SQLAlchemy	`bindparam()`, `:name` in `text()`	`text()` with `.format()`
Node.js/pg	`$1`/`$2` placeholders	Template literals
Node.js/mysql2	`?` placeholders	Template literals
Java/JDBC	`PreparedStatement` `?`	`Statement.execute(string)`
PHP/PDO	`?` or `:name`	`query()` with concatenation
Ruby/ActiveRecord	`?` or hash conditions	String interpolation in `where()`
Go/database/sql	`?` or `$1`	`db.Exec(string)` with `+`
.NET/ADO.NET	`@param` in `SqlParameter`	`SqlCommand` with string concat

Database-Specific Severity¶

DB	Escalation beyond data access
MSSQL	`xp_cmdshell` → OS command execution (if sysadmin)
PostgreSQL	`COPY TO PROGRAM` → OS command execution (if superuser)
MySQL	`INTO OUTFILE` → write files to web root; UDF loading
SQLite	File read via `ATTACH DATABASE`; limited escalation
Oracle	`DBMS_SCHEDULER` → OS commands (if DBA)

Fixing Patterns¶

Pattern	Notes
Parameterized queries	Primary defense. Works for all value types.
Stored procedures	Effective if SP itself uses parameterization internally
Input allowlisting for identifiers	Mandatory for table/column/ORDER BY; parameterization is insufficient
ORMs with safe query API	Safe by default; audit all `raw()` equivalents
WAF (Web Application Firewall)	Defense-in-depth only; bypassable, not a primary control
Least privilege DB account	Limits blast radius; does not prevent injection
Error message suppression	Prevents blind-injection enumeration; not a fix

Gotchas - False Positive Indicators¶

Constant query strings with no interpolation: db.execute("SELECT 1") - no user data involved.
Integer-only params cast before query: id = int(request.GET['id']) then f"WHERE id = {id}" - integer casting prevents string injection. Still a code smell; prefer parameterization.
Escaping functions used correctly: pg_escape_string(), mysql_real_escape_string() in narrow contexts - technically safe for string values but does not protect identifiers; fragile if charset is mutable. Not recommended but not automatically a finding.
ORM query builder methods (not raw): User.objects.filter(name=value) - this is parameterized by the ORM; not a finding. Only .raw() and .extra() are escape hatches.
Test/fixture code: hardcoded queries in test files with 'test_user' literals - not a finding.