CWE-89: SQL Injection¶
CWE Top 25 Rank: 3 (2024). OWASP A03:2021. Impact range: authentication bypass, full DB dump, RCE via xp_cmdshell/COPY TO PROGRAM/UDF.
Functional Semantics¶
SQL is a data-as-code language: the parser cannot distinguish intended query structure from injected structure. The vulnerability exists at the boundary where a string containing SQL is constructed by concatenating or interpolating untrusted data. The database parser evaluates the combined string as a single query; injected tokens alter the parse tree, not just the data values.
Canonical parse tree manipulation:
-- Intended: SELECT * FROM users WHERE name = '<input>'
-- Input: ' OR '1'='1
-- Result: SELECT * FROM users WHERE name = '' OR '1'='1'
-- ^^^^^^^^^^ injected predicate, always true
The critical distinction: parameterized queries pass the value after parsing, so the parse tree is fixed and injected SQL tokens are treated as literal string data.
Non-Obvious Injection Vectors¶
1. ORM "escape hatch" raw queries¶
# VULNERABLE: Django ORM raw() with f-string
def search_users(request):
username = request.GET['q']
users = User.objects.raw(f"SELECT * FROM auth_user WHERE username = '{username}'")
# Equivalent to classic injection; ORM provides zero protection here
# FIXED: parameterized raw query
def search_users(request):
username = request.GET['q']
users = User.objects.raw(
"SELECT * FROM auth_user WHERE username = %s",
[username]
)
ORM raw query functions that bypass parameterization (all require manual scrutiny): - Django: Model.objects.raw(), connection.cursor().execute() with interpolation, extra(where=), RawSQL() - SQLAlchemy: text() with .format() or f-strings, engine.execute(string % values) - ActiveRecord: where("name = '#{params[:name]}'") - Hibernate: session.createNativeQuery(hql_string) with string concat - GORM: db.Raw("... " + userInput)
2. Dynamic table and column names¶
Parameterization only applies to values, not identifiers (table names, column names, ORDER BY fields). Identifiers cannot be parameterized in standard SQL.
# VULNERABLE: dynamic ORDER BY column
def get_sorted_data(request):
col = request.GET.get('sort', 'name')
query = f"SELECT * FROM products ORDER BY {col}" # injection via sort=1; DROP TABLE products--
return db.execute(query)
# FIXED: allowlist identifiers
ALLOWED_SORT_COLUMNS = {'name', 'price', 'created_at'}
def get_sorted_data(request):
col = request.GET.get('sort', 'name')
if col not in ALLOWED_SORT_COLUMNS:
col = 'name'
query = f"SELECT * FROM products ORDER BY {col}" # col is now validated
return db.execute(query)
Note: backtick/double-quote quoting of identifiers does NOT prevent injection - an attacker can close the quote. Allowlist is the only safe approach for identifiers.
3. LIKE pattern injection¶
// VULNERABLE: Node.js with pg - LIKE wildcard injection
async function searchProducts(req, res) {
const term = req.query.term;
// Attacker sends: term = % → returns all rows
// Attacker sends: term = \%anything\% → escaping manipulation
const result = await pool.query(
`SELECT * FROM products WHERE name LIKE '%${term}%'`
);
}
Even with parameterization, LIKE metacharacters (%, _) in the value cause DoS (full table scan) or unexpected results. Two separate concerns:
// FIXED: parameterize the whole pattern, escape LIKE metacharacters in value
async function searchProducts(req, res) {
const term = req.query.term
.replace(/\\/g, '\\\\')
.replace(/%/g, '\\%')
.replace(/_/g, '\\_');
const result = await pool.query(
'SELECT * FROM products WHERE name LIKE $1 ESCAPE \'\\\'',
[`%${term}%`]
);
}
4. PostgreSQL JSON/JSONB path injection¶
# VULNERABLE: jsonb path in raw query
def filter_by_attr(conn, attr_name, value):
query = f"SELECT * FROM items WHERE data->>'{attr_name}' = %s"
# attr_name injection: ' OR 1=1 -- breaks out of jsonb operator context
conn.execute(query, (value,))
# FIXED: allowlist attr_name
ALLOWED_ATTRS = {'color', 'size', 'brand'}
def filter_by_attr(conn, attr_name, value):
if attr_name not in ALLOWED_ATTRS:
raise ValueError("Invalid attribute")
query = f"SELECT * FROM items WHERE data->>{attr_name!r} = %s"
conn.execute(query, (value,))
5. GraphQL resolvers building raw queries¶
// VULNERABLE: GraphQL resolver constructing SQL
const resolvers = {
Query: {
user: async (_, { filter }) => {
// filter = { field: "name", value: "alice' OR '1'='1" }
const sql = `SELECT * FROM users WHERE ${filter.field} = '${filter.value}'`;
return db.query(sql);
}
}
};
// FIXED: parameterize and validate field name
const ALLOWED_FIELDS = new Set(['name', 'email', 'role']);
const resolvers = {
Query: {
user: async (_, { filter }) => {
if (!ALLOWED_FIELDS.has(filter.field)) throw new Error('Invalid field');
const sql = `SELECT * FROM users WHERE ${filter.field} = $1`;
return db.query(sql, [filter.value]);
}
}
};
6. Second-order injection¶
Data stored safely (correctly parameterized) but later retrieved and embedded unescaped into a new query.
-- Registration: username 'admin'-- stored safely in DB
INSERT INTO users (username) VALUES ($1) -- parameterized, safe
-- Password change: username retrieved from session, embedded into new query
SELECT * FROM users WHERE username = 'admin'--' AND password = 'x'
-- The -- comments out the password check
Detection: trace all paths where DB-resident data is read and then used in query construction. The parameterization at write time does not protect the read-then-use path.
Trigger Conditions¶
| Vector | Trigger |
|---|---|
| HTTP query param/body | Direct user input into query |
| HTTP header (User-Agent, Referer) | Logged to DB via raw INSERT |
| Cookie/session value | Retrieved and used in query |
| JSON body field | Extracted field used in dynamic SQL |
| File upload (filename) | Filename stored/queried without parameterization |
| External API response | Stored then used in second-order injection |
GraphQL variables | Any field passed to dynamic SQL |
Detection Heuristics¶
Trace: HTTP input → query construction
- Identify all HTTP input sources:
request.GET,request.POST,request.headers, cookies, URL path segments, JSON body fields. - Follow each value through the call stack.
- Flag any point where the value (or a derivative) is concatenated/interpolated into a string that is passed to a DB execution function.
- Specifically check:
cursor.execute(string),raw(),text(), f-strings/.format()calls within DB layer,+operator applied to strings in functions namedquery,search,filter,find,fetch.
High-signal code patterns:
# Python - flag any of these patterns
f"SELECT ... {user_var}"
"SELECT ... " + user_var
"SELECT ... %s" % user_var # % formatting - NOT parameterized (different from %s as placeholder)
cursor.execute("SELECT ... '" + val + "'")
// JavaScript - flag these
`SELECT ... ${userVal}`
"SELECT ... " + userVal
db.query("SELECT ... '" + val + "'")
// Java - flag these
"SELECT * FROM t WHERE x = '" + input + "'"
stmt.execute(String.format("SELECT ... %s", input))
ORM-specific escape hatches to search:
# grep patterns for common ORMs
grep -rn "\.raw(" --include="*.py"
grep -rn "RawSQL(" --include="*.py"
grep -rn "extra(where" --include="*.py"
grep -rn "createNativeQuery" --include="*.java"
grep -rn "\.where(\".*#\{" --include="*.rb" # Ruby string interpolation in ActiveRecord where
grep -rn "db\.Raw(" --include="*.go"
Affected Ecosystems¶
| Ecosystem | Parameterization API | Common escape hatch |
|---|---|---|
| Python/Django | %s placeholders in execute() | raw(), extra(), RawSQL() |
| Python/SQLAlchemy | bindparam(), :name in text() | text() with .format() |
| Node.js/pg | $1/$2 placeholders | Template literals |
| Node.js/mysql2 | ? placeholders | Template literals |
| Java/JDBC | PreparedStatement ? | Statement.execute(string) |
| PHP/PDO | ? or :name | query() with concatenation |
| Ruby/ActiveRecord | ? or hash conditions | String interpolation in where() |
| Go/database/sql | ? or $1 | db.Exec(string) with + |
| .NET/ADO.NET | @param in SqlParameter | SqlCommand with string concat |
Database-Specific Severity¶
| DB | Escalation beyond data access |
|---|---|
| MSSQL | xp_cmdshell → OS command execution (if sysadmin) |
| PostgreSQL | COPY TO PROGRAM → OS command execution (if superuser) |
| MySQL | INTO OUTFILE → write files to web root; UDF loading |
| SQLite | File read via ATTACH DATABASE; limited escalation |
| Oracle | DBMS_SCHEDULER → OS commands (if DBA) |
Fixing Patterns¶
| Pattern | Notes |
|---|---|
| Parameterized queries | Primary defense. Works for all value types. |
| Stored procedures | Effective if SP itself uses parameterization internally |
| Input allowlisting for identifiers | Mandatory for table/column/ORDER BY; parameterization is insufficient |
| ORMs with safe query API | Safe by default; audit all raw() equivalents |
| WAF (Web Application Firewall) | Defense-in-depth only; bypassable, not a primary control |
| Least privilege DB account | Limits blast radius; does not prevent injection |
| Error message suppression | Prevents blind-injection enumeration; not a fix |
Gotchas - False Positive Indicators¶
- Constant query strings with no interpolation:
db.execute("SELECT 1")- no user data involved. - Integer-only params cast before query:
id = int(request.GET['id'])thenf"WHERE id = {id}"- integer casting prevents string injection. Still a code smell; prefer parameterization. - Escaping functions used correctly:
pg_escape_string(),mysql_real_escape_string()in narrow contexts - technically safe for string values but does not protect identifiers; fragile if charset is mutable. Not recommended but not automatically a finding. - ORM query builder methods (not raw):
User.objects.filter(name=value)- this is parameterized by the ORM; not a finding. Only.raw()and.extra()are escape hatches. - Test/fixture code: hardcoded queries in test files with
'test_user'literals - not a finding.
See Also¶
- CWE-564: Hibernate Injection - ORM-specific SQL injection variant
- CWE-943: Improper Neutralization - parent for NoSQL injection (MongoDB
$where, etc.) - CWE-116: Encoding/Escaping - defense mechanism; escape-based mitigation
- sql injection deep dive - existing knowledge base entry with deeper exploitation patterns