Qwen3.5 is an open-source multimodal LLM from Alibaba Cloud, building on the Qwen 3.0 series with added image understanding capabilities. It supports both text generation and image-based tasks such as image captioning and reverse prompt engineering. Model highlights:Documentation Index
Fetch the complete documentation index at: https://docs.comfy.org/llms.txt
Use this file to discover all available pages before exploring further.
- Multimodal — accepts text and image inputs for visual understanding tasks
- Image captioning — can describe images and generate detailed captions
- Reverse prompt engineering — extract prompts and generation parameters from reference images
- ComfyUI native — works with the built-in
TextGeneratenode, no custom nodes needed - Lightweight — 4B parameter model, suitable for consumer GPUs
Use Cases
Qwen3.5 excels in scenarios where combining visual understanding with text generation adds value to a ComfyUI workflow:- Image reverse prompt engineering — feed a reference image to Qwen3.5 and ask it to generate a detailed text prompt that could reproduce the image. This is especially useful when you encounter a great image but don’t know how it was prompted.
- Prompt optimization — load an existing prompt and an image concept, then ask Qwen3.5 to generate, refine, or expand prompts with richer detail for better generation results.
- Image captioning — automatically generate captions, descriptions, or metadata tags for generated images, useful for cataloging or training data preparation.
- Visual question answering — ask questions about image content (“What objects are in this scene?”, “What color is the background?”) and get structured text answers.
- Text reading — with a suitable prompt, the model may attempt to read visible text or labels in images, though reliability depends on the quality and clarity of the rendered text.
Available workflow
Qwen3.5: Text Generation
Download Workflow
Download JSON or search “Qwen3.5 Text Generation” in Template Library
Run on Comfy Cloud
Open in Comfy Cloud
This workflow demonstrates the text generation and image understanding capabilities of Qwen3.5. It accepts a text prompt and an optional image, and generates descriptive text or structured analysis based on the input.
Inputs:
- Text prompt — your question, instruction, or task description
- Image (optional) — for visual understanding tasks (image captioning, reverse prompt engineering, prompt optimization, etc.)
- Max length — maximum number of tokens to generate (default 256)
- Sampling mode — toggle sampling on/off and adjust temperature, top-k, top-p, repetition penalty, and seed
- Use default template — apply the built-in system prompt for the model
- Generated text — the model’s response as a plain text string
Learn about Subgraph
This workflow uses Subgraph nodes for modular processing. Check out the Subgraph documentation to learn how to customize and extend the workflow.
Model Download
Qwen3.5 models are loaded as text encoders in ComfyUI. Choose the variant that best suits your hardware:Qwen3.5 2B (bf16)
Lightweight, ~4.5 GB. Best for low VRAM setups and fast downloads.
Qwen3.5 4B (bf16)
Balanced size and quality. Recommended for most consumer GPUs.
Qwen3.5 9B (bf16)
Largest variant, ~19 GB. Higher quality output, requires more VRAM.
.safetensors file in: