Skip to main content
Z-Image (造相) is a powerful and highly efficient image generation model with 6B parameters, developed by Alibaba’s Tongyi Lab. It uses a Scalable Single-Stream DiT (S3-DiT) architecture where text, visual semantic tokens, and image VAE tokens are concatenated at the sequence level to serve as a unified input stream, maximizing parameter efficiency. Z-Image (Base) is the non-distilled foundation model designed for community-driven fine-tuning and custom development. Model Highlights:
  • Photorealistic Quality: Delivers strong photorealistic image generation while maintaining excellent aesthetic quality
  • Accurate Bilingual Text Rendering: Excels at accurately rendering complex Chinese and English text
  • Prompt Enhancing & Reasoning: Prompt Enhancer empowers the model with reasoning capabilities
  • Fine-tuning Ready: Ideal base model for custom training and adaptation
Related Links:

Z-Image text-to-image workflow

Make sure your ComfyUI is updated.Workflows in this guide can be found in the Workflow Templates. If you can’t find them in the template, your ComfyUI may be outdated. (Desktop version’s update will delay sometime)If nodes are missing when loading a workflow, possible reasons:
  1. You are not using the latest ComfyUI version (Nightly version)
  2. Some nodes failed to import at startup

Z-Image model downloads

Model Storage Location
📂 ComfyUI/
├── 📂 models/
│   ├── 📂 text_encoders/
│   │      └── qwen_3_4b.safetensors
│   ├── 📂 diffusion_models/
│   │      └── z_image_bf16.safetensors
│   └── 📂 vae/
│          └── ae.safetensors