LaMa (Large Mask Inpainting)¶
Feed-forward inpainting model using Fast Fourier Convolution (FFC) for image-wide receptive field from the first layer. Excels at filling large masks with coherent textures. Resolution-robust: trained at 256x256, infers up to ~2000x2000.
Paper: WACV 2022. Authors: Samsung AI Moscow / AIRI. arXiv:2109.07161.
Architecture¶
U-Net-like fully-convolutional network:
Fast Fourier Convolution (FFC)¶
The key innovation. Each FFC block splits channels into two parallel branches:
Feature map
├─ Local branch (standard convolutions) → high-frequency local details
└─ Global branch:
→ channel-wise Real FFT (spatial → frequency domain)
→ 1×1 conv in frequency domain
→ inverse Real FFT (frequency → spatial domain)
→ image-wide receptive field from layer 1
Why this matters for inpainting: standard CNNs need many layers to propagate information across large masked gaps. FFC's spectral branch covers the entire spatial extent in a single layer — information from opposite sides of the mask is immediately available.
Variants¶
| Variant | Params | Size | Training Data |
|---|---|---|---|
| Big-LaMa | ~51M | ~410 MB | Places365-Challenge |
| Standard LaMa | ~27M | smaller | Places365-Standard |
| CelebA variant | ~27M | smaller | CelebA-HQ (faces) |
Training¶
- Resolution: 256×256
- Masks: on-the-fly random polygonal chains + rectangles (deliberately large)
- Loss: adversarial (non-saturating) + feature matching + High Receptive Field perceptual loss (ResNet50-dilated)
- Discriminator: patch-level, "fake" labels only for mask-intersecting areas
Performance¶
- VRAM: 2-4 GB at 512×512 — runs on consumer GPUs
- Speed: ~2s GPU (HD), ~25s CPU, 26-45ms mobile (Qualcomm NPU)
- Resolution generalization: trains 256px → infers to ~2000px without retraining (FFC property)
- 20% slower but 3-4x fewer params than competing baselines
Relation to Diffusion-Based Inpainting¶
LaMa is a feed-forward model (single pass, deterministic). Diffusion-based inpainting (FLUX Kontext, Step1X Edit) is iterative (20-50 steps) but better at semantic content (faces, complex objects). LaMa excels at textures and patterns.
Practical combination: use LaMa for fast background/texture inpainting, diffusion for semantic regions.
License¶
Apache 2.0 — fully commercial.
Key Links¶
- GitHub: github.com/advimman/lama
- Widely adopted: IOPaint, cleanup.pictures, ComfyUI nodes