Z-Image ComfyUI Workflow Example

Z-Image (造相) is a powerful and highly efficient image generation model with 6B parameters, developed by Alibaba’s Tongyi Lab. It uses a Scalable Single-Stream DiT (S3-DiT) architecture where text, visual semantic tokens, and image VAE tokens are concatenated at the sequence level to serve as a unified input stream, maximizing parameter efficiency. Z-Image (Base) is the non-distilled foundation model designed for community-driven fine-tuning and custom development. Model Highlights:

Photorealistic Quality: Delivers strong photorealistic image generation while maintaining excellent aesthetic quality
Accurate Bilingual Text Rendering: Excels at accurately rendering complex Chinese and English text
Prompt Enhancing & Reasoning: Prompt Enhancer empowers the model with reasoning capabilities
Fine-tuning Ready: Ideal base model for custom training and adaptation

Related Links:

Z-Image text-to-image workflow

Download Workflow

Download the Z-Image text-to-image workflow JSON file.

Run on ComfyUI Cloud

Run this workflow directly on ComfyUI Cloud.

Portable or self deployed users
Desktop or Cloud users

Make sure your ComfyUI is updated.

Workflows in this guide can be found in the Workflow Templates. If you can’t find them in the template, your ComfyUI may be outdated. (Desktop version’s update will delay sometime)If nodes are missing when loading a workflow, possible reasons:

You are not using the latest ComfyUI version (Nightly version)
Some nodes failed to import at startup

Z-Image model downloads

qwen_3_4b.safetensors

Text encoder for Z-Image.

z_image_bf16.safetensors

Diffusion model for Z-Image.

ae.safetensors

VAE for Z-Image.

Model Storage Location

📂 ComfyUI/
├── 📂 models/
│   ├── 📂 text_encoders/
│   │      └── qwen_3_4b.safetensors
│   ├── 📂 diffusion_models/
│   │      └── z_image_bf16.safetensors
│   └── 📂 vae/
│          └── ae.safetensors

Qwen-Image-Layered ComfyUI Workflow Example

Z-Image-Turbo ComfyUI Workflow Example

​Z-Image text-to-image workflow