Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.comfy.org/llms.txt

Use this file to discover all available pages before exploring further.

Make sure your ComfyUI is updated.Workflows in this guide can be found in the Workflow Templates. If you can’t find them in the template, your ComfyUI may be outdated. (Desktop version’s update will delay sometime)If nodes are missing when loading a workflow, possible reasons:
  1. You are not using the latest ComfyUI version (Nightly version)
  2. Some nodes failed to import at startup
Qwen3.5 is an open-source multimodal LLM from Alibaba Cloud, building on the Qwen 3.0 series with added image understanding capabilities. It supports both text generation and image-based tasks such as image captioning and reverse prompt engineering. Model highlights:
  • Multimodal — accepts text and image inputs for visual understanding tasks
  • Image captioning — can describe images and generate detailed captions
  • Reverse prompt engineering — extract prompts and generation parameters from reference images
  • ComfyUI native — works with the built-in TextGenerate node, no custom nodes needed
  • Lightweight — 4B parameter model, suitable for consumer GPUs

Use Cases

Qwen3.5 excels in scenarios where combining visual understanding with text generation adds value to a ComfyUI workflow:
  • Image reverse prompt engineering — feed a reference image to Qwen3.5 and ask it to generate a detailed text prompt that could reproduce the image. This is especially useful when you encounter a great image but don’t know how it was prompted.
  • Prompt optimization — load an existing prompt and an image concept, then ask Qwen3.5 to generate, refine, or expand prompts with richer detail for better generation results.
  • Image captioning — automatically generate captions, descriptions, or metadata tags for generated images, useful for cataloging or training data preparation.
  • Visual question answering — ask questions about image content (“What objects are in this scene?”, “What color is the background?”) and get structured text answers.
  • Text reading — with a suitable prompt, the model may attempt to read visible text or labels in images, though reliability depends on the quality and clarity of the rendered text.

Available workflow

Qwen3.5: Text Generation

Download Workflow

Download JSON or search “Qwen3.5 Text Generation” in Template Library

Run on Comfy Cloud

Open in Comfy Cloud
Qwen3.5 Text Generation Workflow This workflow demonstrates the text generation and image understanding capabilities of Qwen3.5. It accepts a text prompt and an optional image, and generates descriptive text or structured analysis based on the input. Inputs:
  • Text prompt — your question, instruction, or task description
  • Image (optional) — for visual understanding tasks (image captioning, reverse prompt engineering, prompt optimization, etc.)
Key controls:
  • Max length — maximum number of tokens to generate (default 256)
  • Sampling mode — toggle sampling on/off and adjust temperature, top-k, top-p, repetition penalty, and seed
  • Use default template — apply the built-in system prompt for the model
Output:
  • Generated text — the model’s response as a plain text string

Learn about Subgraph

This workflow uses Subgraph nodes for modular processing. Check out the Subgraph documentation to learn how to customize and extend the workflow.

Model Download

Qwen3.5 models are loaded as text encoders in ComfyUI. Choose the variant that best suits your hardware:

Qwen3.5 2B (bf16)

Lightweight, ~4.5 GB. Best for low VRAM setups and fast downloads.

Qwen3.5 4B (bf16)

Balanced size and quality. Recommended for most consumer GPUs.

Qwen3.5 9B (bf16)

Largest variant, ~19 GB. Higher quality output, requires more VRAM.
Place the downloaded .safetensors file in:
📂 ComfyUI/
├── 📂 models/
│   └── 📂 text_encoders/
│       └── qwen3.5_4b_bf16.safetensors   # or 2b / 9b variant