FLUX.2 Klein Character / Identity LoRA¶

Q: 4B vs 9B character quality

4B trains on 16GB but produces lower identity fidelity than 9B. Use 4B for prototyping, 9B for production.

Q: Face feature leakage

even a few images with face-descriptive captions in a 12-image dataset can significantly reduce LoRA identity strength. All captions must follow the scene-only rule.

Q: PuLID requires base model at inference

PuLID-Flux2 injects into double blocks, which are less effective in distilled Klein. Use base model + LoRA + PuLID for maximum identity quality.

★★★★★ Intermediate

Training LoRAs to preserve a specific person's identity with FLUX.2 Klein 9B. Covers optimizer requirements, caption strategy, proportion fixes, and inference-time enhancement.

Dataset Requirements¶

Parameter	Value	Notes
Minimum images	8-12	Well-curated beats quantity
Composition mix	1/3 / 1/3 / 1/3	Headshots / half-body / full-body
Lighting variety	Required	At least 3 different setups
Background variety	Recommended	Helps generalization
Resolution	1024px	Match training resolution

Minimal path (DiffSynth-Studio, 4B): 8-12 images, 900-1500 steps on Klein 4B — works with 16GB VRAM.

Optimizer and Hyperparameters¶

adafactor FAILS on Klein 9B for character training. This is a known issue — adafactor's adaptive scaling does not converge correctly for identity preservation, causing the face to collapse to a generic average by ~1K steps.

# diffusion-pipe config (recommended)
[optimizer]
type = 'AdamW8bitKahan'
lr = 1e-4
betas = [0.9, 0.999]
weight_decay = 0.01

RunComfy Standard Recipe¶

Parameter	Value
Optimizer	adamw8bit
LR	1e-4
Steps	2K-4K (character) / 7K (style)
Network dims	16-32 (conservative)
Repeats	90-120 per image
Precision	fp8 for model, bf16 for LoRA

Herbst Recipe (50+ runs, maximum quality)¶

Parameter	Value
Network dims	128/64/64/32 (linear/linear_alpha/conv/conv_alpha)
Weight decay	0.00001
Steps	7K (also works for character at this rank)
LR	1e-4 (constant, no decay)
LR schedule	constant

Caption Strategy¶

The captioning strategy for identity LoRA is counter-intuitive:

Rule: describe scene and pose, NEVER describe face features.

# CORRECT - identity stored in LoRA weights
"jane_doe, standing in sunlit garden, casual jeans and white blouse,
 three-quarter view, bokeh background"

"jane_doe, seated at cafe table, morning coffee, side profile,
 warm natural light from window"

# WRONG - identity leaks into text space
"jane_doe, brown eyes, oval face, sharp cheekbones, dark wavy hair,
 smiling, studio lighting"

Why this works: The model learns to associate everything NOT described in the caption with the trigger token. Face features described in text get anchored to text tokens, making the LoRA rely on prompting rather than being self-contained.

Trigger Word Placement¶

Always put trigger word first in caption:

{trigger_word}, [scene], [background], [lighting], [pose/framing]

No comma after trigger word is needed; sentence-style captions work fine.

Proportion Issues Fix (Big Heads Problem)¶

Cause¶

Two compounding factors: 1. Dataset bias: if 90% of training images are headshots, the model associates the identity trigger with headshots 2. Distillation artifacts: Klein distilled variant has built-in head-emphasis from its training data

Fix: 1/3 Composition Rule¶

Dataset breakdown:
- 33% close shots (shoulders up)
- 33% half-body (waist up)
- 33% full-body

Caption Composition Explicitly¶

"jane_doe, full body shot, standing, arms at sides, white studio"
"jane_doe, half body, seated at desk, hands visible on table"
"jane_doe, close portrait, turned slightly to camera"

Explicitly naming the framing in captions reinforces the proportion distribution.

Two-Stage Training Workaround¶

If composition fix alone doesn't resolve artifacts: 1. Stage 1: Train character LoRA on curated 1/3 dataset (as above) 2. Stage 2: Short fine-tune (500-1K steps) on full-body only images at lower LR (5e-5)

Inference Enhancement: PuLID-Flux2¶

PuLID-Flux2 boosts identity consistency at inference without retraining:

# PuLID-Flux2 pipeline
from pulid_flux2 import PuLIDPipeline

pipe = PuLIDPipeline.from_pretrained("flux-2-klein-base-9b")
pipe.load_lora("my_character_lora.safetensors")

result = pipe(
    prompt="jane_doe in Paris cafe",
    id_images=["ref1.jpg", "ref2.jpg"],  # up to 8 references
    id_weight=0.8,                        # 0.5-1.0
    num_inference_steps=20,
)

Mechanism: InsightFace face analysis + EVA-CLIP visual features → identity tokens injected into Klein double blocks via cross-attention. Multi-reference via 3D RoPE time offsets (separate temporal position per reference).

References	Quality Gain	VRAM Cost
1	Baseline boost	+2GB
3-4	Best balance	+4GB
8	Maximum	+8GB (sequence limit risk)

ai-toolkit Config Snippet¶

# ai-toolkit character LoRA
job: extension
config:
  name: "klein_character_jane"
  process:
    - type: standard_training
      network:
        type: lora
        linear: 32
        linear_alpha: 32
      train:
        optimizer: adamw8bit
        lr: 1e-4
        steps: 3000
        batch_size: 1
        gradient_accumulation_steps: 2
        gradient_checkpointing: true
      model:
        name_or_path: "flux-2-klein-base-9b"
        is_flux: true
        quantize: true  # fp8
      datasets:
        - folder_path: "data/jane_doe/"
          caption_ext: txt
          num_repeats: 100

Gotchas¶

adafactor convergence failure: switching from adafactor to adamw8bit is not optional on Klein 9B for character training. adafactor works for style LoRA but fails for identity LoRA — symptom is gradual face averaging toward a "stock photo" face.
Repeats math: 90-120 repeats × 10 images = 900-1200 effective training samples per epoch. Too few repeats (<50) with 10 images = undertraining even at 3K steps.
4B vs 9B character quality: 4B trains on 16GB but produces lower identity fidelity than 9B. Use 4B for prototyping, 9B for production.
Face feature leakage: even a few images with face-descriptive captions in a 12-image dataset can significantly reduce LoRA identity strength. All captions must follow the scene-only rule.
PuLID requires base model at inference: PuLID-Flux2 injects into double blocks, which are less effective in distilled Klein. Use base model + LoRA + PuLID for maximum identity quality.