- VAE-free — generates directly in pixel space; no traditional VAE encode/decode
- Dual-level DiT — patch-level DiT + pixel-level DiT for high-quality generation
- Multi aspect ratio — 1024px base resolution with support for several aspect ratios
- ~1.3B parameters — efficient enough for consumer GPUs
- License: NSCLv1 (non-commercial research/evaluation only)
PixelDiT text-to-image workflow
Download Workflow
Download JSON or search “PixelDiT” in Template Library
- ResolutionSelector — choose your desired output resolution
- Text to Image (PixelDiT) subgraph — the core generation node with exposed controls for prompt, seed, model selection and resolution
- SaveImage — saves the generated image
Learn about Subgraph
This workflow uses Subgraph nodes for modular processing. Check out the Subgraph documentation to learn how to customize and extend the workflow.
Workflow controls
The exposed controls on the Text to Image (PixelDiT) subgraph node include:| Control | Description |
|---|---|
| Positive Prompt | The text prompt describing the image you want to generate |
| Negative Prompt | Text describing what to avoid in the generated image |
| Seed | Random seed for reproducibility |
| UNet Model | PixelDiT model checkpoint selection |
| CLIP Model | Text encoder model selection |
Model downloads
PixelDiT uses two model files: a text encoder and the diffusion model.Text Encoder
gemma_2_2b_it_elm_bf16.safetensors — Gemma-2-2B-IT text encoder
Diffusion Model
pixeldit_1300m_1024px_bf16.safetensors — PixelDiT 1300M 1024px diffusion model