Skip to content

REST API Advanced - Performance, Versioning, Caching

Advanced

API performance depends on speed, data volume, and call frequency. Factors like mobile data costs, device battery drain, and serverless billing make efficiency critical.

Rate Limiting and Throttling

Request-Level Rate Limiting

Limit requests per time window by IP, API key, or user. On exceeding: - HTTP 429 Too Many Requests - X-Rate-Limit-Limit - allowed requests in current period - X-Rate-Limit-Remaining - remaining requests - X-Rate-Limit-Reset - seconds until period resets

Rate Limiting Algorithms

  • Fixed window - simple counter per time window
  • Sliding window - more accurate, prevents burst at window boundary
  • Token bucket - allows short bursts while maintaining average rate (most common)
  • Leaky bucket - smooth output rate regardless of input

Application-Level Throttling

Internal protection against surges. Prioritizes urgent requests. Implemented via delays or batch processing.

HTTP Caching Strategies

Four Caching Approaches

Strategy Header Use Case
No cache Cache-Control: private, no-cache, no-store Confidential/dynamic data
Time-based Cache-Control: max-age=3600, must-revalidate Known change interval
Validation-based Cache-Control: no-cache + ETag Unknown change interval
Cache forever Cache-Control: max-age=31536000, public, immutable Immutable content

Validation Flow (ETag)

1. Server responds with: ETag: "abc123"
2. Client conditional request: If-None-Match: "abc123"
3. Server responds: 304 Not Modified (use cache)
   OR sends new data with new ETag

Cache Management

  • Cache files forever with versioned names (style1a.css)
  • On update, rename to style2b.css - browser treats as new resource
  • Full invalidation: Clear-Site-Data: "cache", "cookies", "storage"

Compression

No API design change needed:

# Client
Accept-Encoding: gzip, deflate, br

# Server
Content-Encoding: gzip

Batch Requests

Combine multiple API operations into one. Reduces latency, saves bandwidth. All sub-requests packaged in single request; responses returned together. See Google Drive API batch format as reference.

API Versioning

Why Version

  • Maintain backward compatibility for existing clients
  • Clear communication of available features
  • Allow clients to migrate at their own pace

Semantic Versioning

  • Non-breaking (new endpoints, new optional fields, new response fields): minor bump (1.0 -> 1.1)
  • Breaking (removed endpoints, changed required fields, changed error codes): major bump (1.x -> 2.0)
  • Keep max 2 active versions; notify clients of deprecation

Versioning Approaches

Approach Example Notes
URL path /api/v1/users Most common, explicit
Query parameter /api/users?version=1 Simple but messy
Header Accept: application/vnd.api.v1+json Cleaner URLs

Zero-Downtime API Migration

  • Deprecation announcements with clear timeline
  • Monitor usage of old versions before removal
  • Keep old API versions longer than expected - some clients call APIs only once per year
  • Expand-contract pattern: add new alongside old, migrate, remove old

Retry Patterns

Exponential Backoff

Start with short delay, exponentially increase: 5s -> 25s -> 125s

Jitter

Add randomness to retry delays to prevent thundering herd when multiple clients retry simultaneously.

Retry Rules

  • Only retry on transient errors (5xx, network timeouts)
  • Don't retry client errors (4xx) - they won't self-resolve
  • Set maximum retry count
  • Consider circuit breaker for persistent failures

Error Response Format

Standardized format across all APIs:

{
  "error": {
    "code": "VALIDATION_ERROR",
    "message": "Human-readable description",
    "details": [
      {"field": "email", "issue": "invalid format"}
    ]
  }
}

Never expose internal error details (stack traces, SQL queries) in production.

Gotchas

  • Caching + auth - never cache responses with Authorization header unless explicitly public
  • 429 without headers - always include rate limit headers so clients can self-throttle
  • Retry on POST - dangerous without idempotency keys, can create duplicate resources
  • Cache stampede - when popular key expires, many requests hit DB simultaneously. Use locking or probabilistic early expiration
  • Compression CPU cost - enable for text content (JSON, HTML) but not for already-compressed content (images, video)

See Also