Qwen-Image-Layered is a model developed by Alibaba’s Qwen team that can decompose an image into multiple RGBA layers. This layered representation unlocks inherent editability: each layer can be independently manipulated without affecting other content.
Key Features:
- Inherent Editability: Each layer can be independently manipulated without affecting other content
- High-Fidelity Elementary Operations: Supports resizing, repositioning, and recoloring with physical isolation of semantic components
- Variable-Layer Decomposition: Not limited to a fixed number of layers - decompose into 3, 4, 8, or more layers as needed
- Recursive Decomposition: Any layer can be further decomposed, enabling infinite decomposition depth
Related Links:
Qwen-Image-Layered workflow
Download JSON Workflow File
Make sure your ComfyUI is updated.Workflows in this guide can be found in the Workflow Templates.
If you can’t find them in the template, your ComfyUI may be outdated. (Desktop version’s update will delay sometime)If nodes are missing when loading a workflow, possible reasons:
- You are not using the latest ComfyUI version (Nightly version)
- Some nodes failed to import at startup
- The Desktop is base on ComfyUI stable release, it will auto-update when there is a new Desktop stable release available.
- Cloud will update after ComfyUI stable release.
So, if you find any core node missing in this document, it might be because the new core nodes have not yet been released in the latest stable version. Please wait for the next stable release.
Model links
text_encoders
diffusion_models
vae
Model Storage Location
📂 ComfyUI/
├── 📂 models/
│ ├── 📂 text_encoders/
│ │ └── qwen_2.5_vl_7b_fp8_scaled.safetensors
│ ├── 📂 diffusion_models/
│ │ └── qwen_image_layered_bf16.safetensors
│ └── 📂 vae/
│ └── qwen_image_layered_vae.safetensors
FP8 version
By default we are using bf16, which requires high VRAM. For lower VRAM usage, you can use the fp8 version:
Then update the Load Diffusion model node inside the Subgraph to use it.
Workflow settings
Sampler settings
This model is slow. The original sampling settings are steps: 50 and CFG: 4.0, which will at least double the generation time.
For input size, 640px is recommended. Use 1024px for high-resolution output.
Prompt (optional)
The text prompt is intended to describe the overall content of the input image—including elements that may be partially occluded (e.g., you may specify the text hidden behind a foreground object). It is not designed to control the semantic content of individual layers explicitly.