Pixel Art Generation¶

★★★★★ Advanced

Algorithms and models for converting raster images to pixel art, generating pixel art via diffusion LoRAs, and training custom pixel-art style LoRAs. For LoRA training fundamentals see diffusion lora training.

Key Facts¶

pyxelate last stable: v2.1.1 (2022); Python 3.10 venv required — numba incompatible on 3.12+
nerijs/pixel-art-xl: SDXL LoRA with LCM scheduler, 8-step inference, last updated April 2026
fal/flux-2-klein-4b-spritesheet-lora: FLUX.2 Klein 4B LoRA generating 2×2 multi-angle sprite sheets
Shakker-Labs/FLUX.1-Kontext-dev-LoRA-Pixel-Style: img2img pixel conversion via flux kontext pipeline, 24 steps, CFG 2.5
RetroDiffusion: purpose-trained model (not SD LoRA); true grid-aligned output; REST API at api.retrodiffusion.ai/v1/inferences; also on Replicate retro-diffusion/rd-fast
PixelLab: Python SDK pip install pixellab; MCP server available; sprite rotation + animation endpoints
SD-piXL: ETH Zurich SIGGRAPH Asia 2024 (arxiv 2410.06236); score distillation with hard H×W + N-color constraints; research-grade, no pip package
libimagequant (pngquant's quantizer): available as imagequant PyPI package (released Oct 2025) or via Pillow 10+ Image.Quantize.LIBIMAGEQUANT
hitherdither: no PyPI release; install from GitHub; dithering-only (no palette extraction)

Conversion Algorithms¶

pyxelate — Bayesian GMM + Atkinson dithering¶

Best for clean/cartoon sources. Struggles with JPEG compression artifacts.

from pyxelate import Pyx, Pal
from skimage import io

img = io.imread("source.jpg")
pyx = Pyx(factor=8, palette=24, dithering="atkinson")
pyx.fit(img)
result = pyx.transform(img)
io.imsave("output.png", result)

factor=8: each output pixel = 8×8 source pixels. For 512px source → 64px output. Speed: ~2–60s per image CPU (slow; factor is the main lever). License: MIT.

Pillow NEAREST + libimagequant¶

Production batch pipeline. Best palette fidelity; ~0.5s/image CPU.

from PIL import Image

def to_pixel_art(src: str, dst: str, scale: int = 8, n_colors: int = 32) -> None:
    img = Image.open(src).convert("RGBA")
    pw = max(16, img.width // scale)
    ph = max(16, img.height // scale)
    small = img.resize((pw, ph), Image.NEAREST)
    quantized = small.quantize(
        colors=n_colors,
        method=Image.Quantize.LIBIMAGEQUANT,
        dither=1,
    )
    quantized.save(dst)

LIBIMAGEQUANT uses perceptual LAB weighting; MEDIANCUT is the fallback if not compiled in.

hitherdither — ordered dithering pass¶

Apply after palette selection (Pillow or imagequant) for crispier ordered patterns.

# pip install git+https://github.com/hbldh/hitherdither
import hitherdither
from PIL import Image

img = Image.open("source.jpg").convert("RGB").resize((64, 64), Image.NEAREST)
palette = hitherdither.palette.Palette.create_by_median_cut(img, n=32)
dithered = hitherdither.ordered.bayer.bayer_dithering(
    img, palette, [256/4, 256/4, 256/4], order=8
)
dithered.save("output.png")

Bayer order=8 gives crisper results than Floyd-Steinberg; Yliluoma's algorithm is highest-quality but slower. License: MIT.

AI post-process — downsample AI output to pixel grid¶

from PIL import Image

def ai_to_pixelart(
    src: str,
    dst: str,
    target_w: int = 320,
    target_h: int = 480,
    n_colors: int = 48,
) -> None:
    img = Image.open(src).convert("RGB")
    # Two-step: LANCZOS first, NEAREST final (preserves hard edges)
    mid = img.resize((target_w * 2, target_h * 2), Image.LANCZOS)
    small = mid.resize((target_w, target_h), Image.NEAREST)
    quantized = small.quantize(
        colors=n_colors,
        method=Image.Quantize.LIBIMAGEQUANT,
        dither=Image.Dither.FLOYDSTEINBERG,
    )
    quantized.convert("RGB").save(dst)

rembg — background removal preprocessing¶

Removes noisy backgrounds before conversion; prevents palette confusion from JPEG artifacts. pip install rembg[gpu]; RMBG-2.0 model; ~1–3s GPU / ~5–10s CPU.

from rembg import remove
from PIL import Image
import io

with open("source.jpg", "rb") as f:
    img = Image.open(io.BytesIO(remove(f.read())))  # RGBA PNG with alpha

GCD-based pixel grid size detection¶

Infer the logical block size from run-length GCD before choosing scale factor.

from math import gcd
from functools import reduce
import numpy as np
from PIL import Image

def detect_pixel_grid_size(img: Image.Image) -> int:
    arr = np.array(img.convert("RGB"))
    runs = []
    for row in arr[::4]:  # sample every 4 rows for speed
        cur, count = row[0].tobytes(), 1
        for px in row[1:]:
            if px.tobytes() == cur: count += 1
            else: runs.append(count); cur, count = px.tobytes(), 1
        runs.append(count)
    cands = [r for r in runs if 1 < r <= 32]
    return max(1, min(reduce(gcd, cands), 16)) if cands else 1

Generative / LoRA Approaches¶

nerijs/pixel-art-xl — SDXL + LCM (8-step)¶

from diffusers import DiffusionPipeline, LCMScheduler
import torch

pipe = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    torch_dtype=torch.float16,
)
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)
pipe = pipe.to("cuda")
pipe.load_lora_weights("latent-consistency/lcm-lora-sdxl", adapter_name="lcm")
pipe.load_lora_weights(
    "nerijs/pixel-art-xl",
    weight_name="pixel-art-xl.safetensors",
    adapter_name="pixel",
)
pipe.set_adapters(["lcm", "pixel"], adapter_weights=[1.0, 1.2])

image = pipe(
    prompt="pixel art castle, 8-bit style",
    num_inference_steps=8,
    guidance_scale=1.5,
    height=512,
    width=512,
).images[0]

# Snap to true pixel grid
w, h = image.size
image.resize((w // 8, h // 8), Image.NEAREST).resize((w, h), Image.NEAREST).save("out.png")

Requires 8+ GB VRAM. License: CreativeML OpenRAIL-M. For img2img: use StableDiffusionXLImg2ImgPipeline with strength=0.6–0.75.

FLUX.1-dev + atmospheric pixel art LoRAs¶

Notable FLUX.1-dev LoRAs for high-detail pixel art (CivitAI/HF): - UmeAiRT/FLUX.1-dev-LoRA-Modern_Pixel_art — trigger: umempart; 100-image training set - CivitAI #683579 "Pixel Art Illustrations FLUX" — V1 trained on 4×4-pixel-block art; best for atmospheric landscapes - CivitAI #629354 "Pixel Art Styles FluxV3" — trained on MidJourney output; best for compositional depth

from diffusers import FluxPipeline
import torch

pipe = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    torch_dtype=torch.bfloat16,
).to("cuda")
pipe.load_lora_weights(
    "UmeAiRT/FLUX.1-dev-LoRA-Modern_Pixel_art",
    weight_name="FLUX.1-dev-LoRA-Modern_Pixel_art.safetensors",
)
pipe.set_adapters(["default"], adapter_weights=[0.85])

image = pipe(
    prompt="umempart, atmospheric pixel art, mountain fortress, pine forest mist, warm window light",
    height=1024,
    width=640,
    num_inference_steps=28,
    guidance_scale=3.5,
).images[0]
image.save("cover_draft.png")
# Then run ai_to_pixelart() above for palette quantization

FLUX.1-Kontext img2img pixel conversion¶

Shakker-Labs/FLUX.1-Kontext-dev-LoRA-Pixel-Style: preserves source composition while applying pixel style.

from diffusers import FluxKontextPipeline
import torch
from PIL import Image

pipe = FluxKontextPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-Kontext-dev",
    torch_dtype=torch.bfloat16,
).to("cuda")
pipe.load_lora_weights(
    "Shakker-Labs/FLUX.1-Kontext-dev-LoRA-Pixel-Style",
    weight_name="FLUX1-Kontext-dev-LoRA-Pixel-Style.safetensors",
)

image = pipe(
    image=Image.open("source.png"),
    prompt="convert to atmospheric pixel art, crisp pixel edges, 50+ color palette",
    num_inference_steps=24,
    guidance_scale=2.5,
).images[0]

See flux kontext for pipeline setup details.

fal spritesheet LoRA — multi-angle sprites¶

fal/flux-2-klein-4b-spritesheet-lora: generates 2×2 spritesheet (isometric front-right, front-left, side, top-down). Available via fal.ai API without local GPU:

import fal_client

result = fal_client.subscribe(
    "fal-ai/flux-2-klein/4b/base/edit/lora",
    arguments={
        "prompt": "red dragon character, pixel art sprite, 2x2 spritesheet",
        "loras": [{"path": "fal/flux-2-klein-4b-spritesheet-lora", "scale": 1.0}],
    },
)

Base model: FLUX.2 Klein 4B. 4B LoRA not compatible with 9B — see flux klein 9b inference.

RetroDiffusion — purpose-trained API¶

Purpose-trained on pixel art data (not SD adapter). True grid-aligned output. Models: rd_plus__low_res (16–128px), rd_plus__default, rd_plus__mc_item, rd_plus__topdown_item.

import requests

response = requests.post(
    "https://api.retrodiffusion.ai/v1/inferences",
    headers={"X-RD-Token": "YOUR_KEY"},
    json={
        "width": 128,
        "height": 128,
        "prompt": "stone castle tower, fog, pixel art",
        "prompt_style": "rd_plus__default",
        "num_images": 1,
    },
)
# Also: replicate.run("retro-diffusion/rd-fast", input={...})

50 free credits on signup; ~$0.02–0.05/image on Replicate.

PixelLab SDK — game sprites and animation¶

from pixellab import PixelLab

client = PixelLab(api_key="YOUR_KEY")
result = client.generate_image_pixflux(
    description="knight character, dark fantasy, pixel art",
    image_size={"width": 64, "height": 64},
)
result.image.save("sprite.png")

Supports sprite rotation (4/8 directional), skeleton animation, inpainting. MCP server: github.com/pixellab-code/pixellab-mcp.

LoRA Training for Pixel Art Style¶

For training tool comparison, hyperparameters, and dataset guidelines see diffusion lora training.

Dataset size: 50–200 curated images is the effective range. 1000 mixed-style images train a noisier LoRA than 200 consistent ones. Pre-process with the Pillow + libimagequant pipeline first to produce clean indexed PNGs.

Pixel-art-specific params:

Parameter	Value	Notes
Rank / alpha	16–32 / rank/2	Higher rank captures more detail but slower
LR (FLUX)	1e-4	FLUX is LR-sensitive; see diffusion lora training
Steps (100-img dataset)	1000–2000	Fewer steps than portrait LoRA
Augmentation	Disabled	Hard pixel edges blur under standard augmentation
Captions	Dense scene descriptions	Not short prompts; describe palette, grid style
Augment text	"pixel art", "8-bit style", pixel size	Reinforce style in every caption

Dataset prep: batch-convert with the to_pixel_art() function in the Conversion Algorithms section above. Produces clean indexed PNGs as training data.

Cloud training (fal.ai): fal-ai/flux-lora-fast-training — ~$8 / 1000 steps, returns hosted endpoint. fal-ai/flux-2-trainer for FLUX.2 Klein base.

Collection-Level Learning¶

For extracting style profiles and scene grammar from a large image collection:

Embedding + clustering — DINOv2 for visual style clustering (texture/palette); SigLIP-2 for semantic text-to-image search. Cluster with UMAP (20D) + HDBSCAN → 15–40 style groups. Use mlxtend.frequent_patterns.fpgrowth for scene grammar (element co-occurrence). See synthetic dataset pipeline for full pipeline patterns.

from transformers import AutoImageProcessor, AutoModel
import torch, numpy as np
from PIL import Image

def embed_batch(paths: list[str], model_name="facebook/dinov2-large") -> np.ndarray:
    proc = AutoImageProcessor.from_pretrained(model_name)
    model = AutoModel.from_pretrained(model_name).cuda().eval()
    imgs = [Image.open(p).convert("RGB").resize((224, 224)) for p in paths]
    inp = proc(images=imgs, return_tensors="pt").to("cuda")
    with torch.no_grad():
        return model(**inp).last_hidden_state[:, 0].cpu().numpy()  # (N, 1024)

Public pixel art datasets (HuggingFace):

Dataset	Size	License
`bghira/free-to-use-pixelart`	~2K	Permissive for training
`jainr3/diffusiondb-pixelart`	~16K	Research
`VatsaDev/pixel-art`	~5K	Mixed
`carlosuperb/lpc-4view-pixel-art-diffusion`	~8K	LPC (CC-BY-SA 3.0)
`UmeAiRT/Ume-Modern_Pixel_Art`	~100	Per-model card

OpenGameArt.org for CC0 game assets; Lospec Gallery for reference (check ToS before scraping).

Tools Landscape¶

Tool	Type	API/Python	License	Best For
pyxelate	Library	Python	MIT	Retro look, clean sources
Pillow + libimagequant	Library	Python	HPND	Production batch
hitherdither	Library	Python	MIT	Ordered dithering step
rembg	Library	Python	MIT	Background removal pre-step
ImageGoNord	Library	Python	MIT	Palette-snapping to fixed palette
nerijs/pixel-art-xl	SDXL LoRA	diffusers	OpenRAIL-M	8-step pixel generation/img2img
FLUX.1-Kontext Pixel LoRA	FLUX LoRA	diffusers	Apache-2.0	Structure-preserving pixel conversion
fal spritesheet LoRA	FLUX.2 LoRA	fal.ai / diffusers	Custom	Multi-angle sprite sheets
SD-piXL	Research model	Python (GitHub)	Research	Hard-constrained grid + palette
RetroDiffusion	Commercial API	REST + Replicate	Proprietary	True pixel art, game assets
PixelLab	Commercial API	Python SDK + MCP	Proprietary	Sprites, animation, rotation
PixelOver	Desktop app	None (UI only)	Paid (~$10)	Visual exploration
Pixelmash	Desktop app	None (UI only)	Paid (~$40)	Non-destructive pixel layer
Aseprite	Desktop app	Lua scripting	Paid ($20)	Manual pixel editing
Pixelit	JS Library	JavaScript only	MIT	Browser/Node pipelines
ComfyUI-PixelArt-Detector	ComfyUI nodes	ComfyUI	MIT	Palette + downscale in workflow

Gotchas¶

Issue: pyxelate fails to install on Python 3.12+ due to numba version coupling (numba >= 0.59 required for 3.12, but pyxelate pins older numba). -> Fix: Use a python3.10 -m venv environment, or skip pyxelate and use Pillow + libimagequant instead. Verify with python -c "import numba; import pyxelate" before batch runs.
Issue: Image.Quantize.LIBIMAGEQUANT raises ImportError or silently falls back to MEDIANCUT on some Pillow builds (pre-compiled wheels without libimagequant). -> Fix: Install imagequant PyPI package (Oct 2025 release) separately: pip install imagequant, then use imagequant.quantize_pil_image() directly instead of Image.quantize().
Issue: nerijs/pixel-art-xl and most FLUX LoRAs produce anti-aliased output (fractional-pixel rendering) — output does not snap to a clean integer grid. -> Fix: Always follow with the two-step downsample: img.resize((w//N, h//N), NEAREST).resize((w, h), NEAREST). Do not use LANCZOS for the final snap.
Issue: FLUX.2 Klein 4B spritesheet LoRA (fal/flux-2-klein-4b-spritesheet-lora) cannot be loaded on Klein 9B — architecture mismatch (text encoder shape). -> Fix: The LoRA was trained on Klein 4B; 4B and 9B LoRAs are incompatible (different Qwen text encoder sizes). Use fal.ai API or run the 4B base model locally. See flux klein 9b inference for model variant details.
Issue: Training pixel art LoRA on raw JPEG sources produces inconsistent palette noise; standard augmentation (flip, rotation, color jitter) blurs hard pixel edges. -> Fix: (a) Pre-process all training images through Pillow + libimagequant before training to normalize palette. (b) Disable all augmentation — hard edges are the defining characteristic and any interpolation destroys them.
Issue: Grid-size mismatch: converting a 512px image with scale=8 gives 64px output, but the source image's internal pixel grid may be 2px, 4px, or 8px blocks — resulting in blurry sub-pixel output. -> Fix: Use detect_pixel_grid_size() (GCD of run lengths) to infer the logical block size before choosing the scale factor.