ML Plugin Inference UX Patterns¶

★★★★★ Intermediate

Patterns for making slow ML inference (10-30s per operation) feel fast inside desktop creative applications (Photoshop, Lightroom, etc.). Covers caching, progressive preview, background pre-compute, batch queuing, and predictive prefetch.

Latency Thresholds (Miller / Doherty)¶

Threshold	Value	User perception
Immediate	<100 ms	Direct manipulation — no perceived delay
Interactive	<400 ms (Doherty)	Seamless continuation
Responsive	1-3 s	"System is working" — spinner acceptable
Background	3-10 s	Progress bar required; user may switch tasks
Blocking	>10 s	Anti-pattern — must never block UI

Rule: ML inference that takes >400 ms must run in background. The UI thread must never block.

Per-Image Computation Cache¶

Cache per-image preprocessing results keyed by image hash. Eliminates redundant compute on repeated applies to the same image.

import hashlib, pickle, os
from pathlib import Path

class InferenceCache:
    def __init__(self, cache_dir: str, max_entries: int = 50):
        self.dir = Path(cache_dir)
        self.dir.mkdir(exist_ok=True)
        self.max_entries = max_entries

    def _key(self, img_array) -> str:
        """Hash first 64KB of image data — fast, collision-rare."""
        sample = img_array.flat[:65536].tobytes()
        return hashlib.blake2b(sample, digest_size=16).hexdigest()

    def get(self, img_array, stage: str):
        key = self._key(img_array)
        path = self.dir / f"{key}_{stage}.pkl"
        if path.exists():
            with open(path, 'rb') as f:
                return pickle.load(f)
        return None

    def put(self, img_array, stage: str, value):
        key = self._key(img_array)
        path = self.dir / f"{key}_{stage}.pkl"
        with open(path, 'wb') as f:
            pickle.dump(value, f)
        self._evict_if_needed()

    def _evict_if_needed(self):
        entries = sorted(self.dir.glob("*.pkl"), key=os.path.getmtime)
        while len(entries) > self.max_entries:
            entries.pop(0).unlink()

Cacheable stages per image: - Face detection landmarks (stable — image content doesn't change) - Skin segmentation mask - Frequency separation HF/LF layers - VAE latent encoding (most expensive — cache first)

Cache key: Hash of raw pixel data, not filename. Filename-based keys miss in-place edits.

Background Idle Pre-Compute¶

On image open, start computing expensive preprocessing stages during the idle period before the user clicks Apply. When Apply fires, pre-computed data is already available.

import threading
from enum import Enum

class Stage(Enum):
    A = "face_landmarks"
    B = "skin_segmentation"
    C = "freq_sep"
    D = "vae_encode"

class BackgroundPrecompute:
    def __init__(self, cache: InferenceCache):
        self.cache = cache
        self._thread = None
        self._cancel = threading.Event()

    def start(self, img_array, on_stage_done=None):
        """Call when document is opened or becomes active."""
        self._cancel.clear()
        self._thread = threading.Thread(
            target=self._run, args=(img_array, on_stage_done), daemon=True
        )
        self._thread.start()

    def cancel(self):
        self._cancel.set()

    def _run(self, img, callback):
        stages = [
            (Stage.A, compute_face_landmarks),
            (Stage.B, compute_skin_seg),
            (Stage.C, compute_freq_sep),
            (Stage.D, compute_vae_latent),
        ]
        for stage, fn in stages:
            if self._cancel.is_set():
                return
            if self.cache.get(img, stage.value) is None:
                result = fn(img)
                self.cache.put(img, stage.value, result)
            if callback:
                callback(stage)

Result: First Apply on a freshly-opened image: ~23s cold start drops to ~8s if stages A-C completed in background (Stages A-C take ~6-8s combined; D is 12s and is the biggest win).

Progressive Multi-Resolution Preview¶

Show a low-resolution result immediately, replace with full-resolution when ready. Never block Apply on full-resolution completion.

RESOLUTION_PYRAMID = [
    (1/8, "draft",   target_ms=500),
    (1/4, "preview", target_ms=2000),
    (1/2, "refine",  target_ms=6000),
    (1/1, "final",   target_ms=None),
]

def apply_progressive(img, model, on_update):
    """
    on_update(layer_img, quality_label) called at each pyramid level.
    Caller replaces visible PS layer with each update.
    """
    import concurrent.futures
    executor = concurrent.futures.ThreadPoolExecutor(max_workers=1)

    def run_at_scale(scale):
        h, w = img.shape[:2]
        small = cv2.resize(img, (int(w*scale), int(h*scale)))
        result = model.infer(small)
        return cv2.resize(result, (w, h), interpolation=cv2.INTER_LANCZOS4)

    for scale, label, _ in RESOLUTION_PYRAMID:
        if executor._work_queue.qsize() > 0:
            break  # new Apply requested — abort current
        future = executor.submit(run_at_scale, scale)
        result = future.result()
        on_update(result, label)

Tile-based streaming variant (for large images):

def apply_tiled_streaming(img, model, tile_size=512, on_tile=None):
    """
    Process tiles from center outward. Center tiles appear first —
    matches user attention (center of frame is subject).
    """
    h, w = img.shape[:2]
    tiles = []
    for y in range(0, h, tile_size):
        for x in range(0, w, tile_size):
            cx, cy = x + tile_size//2, y + tile_size//2
            dist = ((cx - w//2)**2 + (cy - h//2)**2) ** 0.5
            tiles.append((dist, x, y))
    tiles.sort()  # process center-nearest first

    result = img.copy()
    for _, x, y in tiles:
        tile = img[y:y+tile_size, x:x+tile_size]
        out = model.infer(tile)
        result[y:y+tile_size, x:x+tile_size] = out
        if on_tile:
            on_tile(result)  # caller updates preview layer
    return result

Intensity Slider (Lerp-Based Instant Preview)¶

After full ML inference completes, store both original and result. Slider becomes a lerp between them — instant, no re-inference.

class LayerBlendController:
    def __init__(self, original, ml_result):
        self.orig = original.astype(np.float32)
        self.result = ml_result.astype(np.float32)
        self._dirty = False   # True if params changed requiring re-inference

    def at_intensity(self, t: float) -> np.ndarray:
        """t in [0..1]. Instant — no ML involved."""
        return np.clip(self.orig + t * (self.result - self.orig), 0, 255).astype(np.uint8)

    def refine_at(self, t: float, model, img, params):
        """Full ML re-inference at exact t. Use for "Accept" action only."""
        params['intensity'] = t
        self.result = model.infer(img, params)
        self._dirty = False
        return self.result

UX pattern (borrowed from audio DAW "bounce"): 1. Apply: show draft result at 1/4-res in ~2s 2. Background: full-res completes in ~23s, replaces draft 3. Slider drag: lerp between original and full-res result — 10ms/frame 4. "Refine" button: re-inference at exact slider position — ~23s, gives accurate (non-lerp) result

Batch Queue¶

Background processing queue that never blocks Photoshop UI. User adds images and continues working.

Queue state machine:
  Idle → (user adds image) → Queued → (worker picks up) → Processing → Done
  Processing → (user clicks current image) → Interrupted → Queued (reprioritized)

Priority rules:
  1. Current active document (always first)
  2. ★ Hero-marked images (full quality)
  3. Queue order (FIFO)
  4. Background items (deferred)

Session-level cache for batch series (same shooting session): - Face detection model: stays warm in VRAM across batch - Skin-tone prior: computed from first 5 images, shared with remaining 45 - LoRA weights: loaded once for entire batch - Result: batch throughput ~60-70% of theoretical sequential × N

Acceptance criteria (Phase 1): - 2nd Apply on same file: <8s (cache hit) - 10-image batch: completes in 4 min vs 10 min sequential - Perceived time on single Apply: 5-8s (user study, questionnaire)

Predictive Prefetch (Markov Chain)¶

Build a Markov chain of user action sequences from PS event history. After D&B Apply, prefetch Volume inputs if transition probability > 40%.

from collections import defaultdict

class ActionPredictor:
    def __init__(self, min_probability: float = 0.40):
        self.transitions = defaultdict(lambda: defaultdict(int))
        self.min_prob = min_probability

    def record(self, from_action: str, to_action: str):
        self.transitions[from_action][to_action] += 1

    def predict_next(self, current_action: str) -> str | None:
        counts = self.transitions[current_action]
        if not counts:
            return None
        total = sum(counts.values())
        best = max(counts, key=counts.get)
        prob = counts[best] / total
        return best if prob >= self.min_prob else None

    def should_prefetch(self, current_action: str) -> bool:
        return self.predict_next(current_action) is not None

Default priors from retoucher community data:

DEFAULT_TRANSITIONS = {
    'db_apply':     {'volume_apply': 0.70, 'beauty_apply': 0.40},
    'volume_apply': {'beauty_apply': 0.62, 'color_apply': 0.40},
    'beauty_apply': {'color_apply': 0.20},
}

After 20-30 sessions, per-user model overrides defaults. Adaptive: retoucher who always does D&B → Color (not Volume) will have that transition reinforced.

Anti-Patterns¶

Pattern	Problem	Fix
Modal spinner blocking PS UI	Users cannot work while plugin thinks	Always background; layer appears when ready
Spinner without context	23s with "Loading…" = rage	Progress bar + stage name + time estimate
Auto-apply to all layers	Violates non-destructive culture	One Apply = one explicit new layer
"Fast mode" checkbox	Stigmatizes quality; complicates docs	Auto-select quality by content (hero vs batch)
Streaming animation always on	Blurry intermediate irritates pros	Optional toggle; default off for production users
Cloud offload by default	Offline expectation; privacy concern	Local first; cloud = explicit opt-in
Stochastic inference (random seed)	Breaks undo/redo consistency	Fixed seed = hash(image_path + params)
Re-inference on every slider tick	23s per tick = unusable	Lerp between cached results; re-infer only on Accept

Gotchas¶

Issue: Background precompute starts, then user opens different image — precomputed cache from wrong image gets used. -> Fix: Include document ID or pixel hash in every cache key. Cancel and restart BackgroundPrecompute on documentChanged event.
Issue: Batch queue uses FIFO, but user star-marks a hero shot mid-queue — it waits until its turn. -> Fix: Implement priority re-ordering: ★ hero images jump to position 1 in queue. Current processing item finishes (no mid-inference cancel — atomic layer commit required).
Issue: Lerp-based intensity slider produces wrong result for non-linear ML operations (beauty/frequency adjustments are not linear). -> Fix: Label the lerp as "preview intensity"; provide "Refine at X%" button that triggers full re-inference at the chosen intensity value and replaces the lerp result.
Issue: VAE latent cache is stale after in-application editing (layer added, curves adjustment) — content hash unchanged, latent wrong. -> Fix: Key the VAE latent cache on (pixel hash + composite layer stack hash). Invalidate on any layer modification to the source document.
Issue: Per-session skin-tone prior built from first 5 images propagates a skin cast from a badly-lit photo to all subsequent batch images. -> Fix: Use prior only as weak regularizer (weight 0.1), not hard target. Each image still gets independent skin segmentation; prior only smooths boundary cases.