Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.comfy.org/llms.txt

Use this file to discover all available pages before exploring further.

Make sure your ComfyUI is updated.Workflows in this guide can be found in the Workflow Templates. If you can’t find them in the template, your ComfyUI may be outdated. (Desktop version’s update will delay sometime)If nodes are missing when loading a workflow, possible reasons:
  1. You are not using the latest ComfyUI version (Nightly version)
  2. Some nodes failed to import at startup
Gemma 4 is the latest generation of lightweight open LLMs from Google DeepMind, built for text generation, image understanding, video analysis, audio transcription, and structured tool use. It is natively supported in ComfyUI as the default Text Generation model. Model highlights:
  • Multimodal by design — accepts text, image, video, and audio inputs simultaneously
  • Three sizes available:
    • E2B (2B) — Fast and lightweight, ideal for consumer GPUs
    • E4B (4B) — Balanced performance, recommended default
    • 31B — Best quality, requires higher VRAM
  • Thinking mode — Built-in step-by-step reasoning before generating answers
  • Long context — Up to 128K tokens (E2B/E4B) and 256K tokens (31B)
  • Multilingual — 35+ languages out of the box, pre-trained on 140+
  • Function calling — Native support for structured tool use and agentic workflows
  • ComfyUI native — loaded and run through the built-in TextGenerate and CLIPLoader nodes
Related links:

Available workflow

Gemma 4: Text Generation

Download Workflow

Download JSON or search “Gemma 4 Text Generation” in Template Library

Run on Comfy Cloud

Open in Comfy Cloud
Gemma 4 Text Generation Workflow This workflow demonstrates the core text generation capabilities of Gemma 4. It accepts an optional image, audio file, or video as additional context alongside your text prompt, and generates natural language output — with support for reasoning, coding, and multilingual prompts. Inputs:
  • Text prompt — your question or instruction
  • Image (optional) — for visual understanding tasks (OCR, object detection, chart reading, etc.)
  • Audio (optional) — for speech recognition or transcription
  • Video (optional) — for video understanding across frames (subsampled to 1 FPS internally)
Key controls:
  • Max length — maximum number of tokens to generate (default 256)
  • Sampling mode — toggle sampling on/off and adjust temperature, top-k, top-p, repetition penalty, and seed
  • Thinking mode — enable step-by-step reasoning before the final answer
  • Use default template — apply the built-in system prompt for the model
Output:
  • Generated text — the model’s response as a plain text string

Learn about Subgraph

This workflow uses Subgraph nodes for modular processing. Check out the Subgraph documentation to learn how to customize and extend the workflow.

Model Download

Gemma 4 models are loaded as text encoders in ComfyUI. Download the relevant model file and place it in the correct directory:

Gemma 4 2B (E2B IT FP8)

Fast, lightweight. Recommended for consumer GPUs.

Gemma 4 4B (E4B IT FP8)

Balanced performance. The default model in the workflow.

View All Variants

Browse all Gemma 4 model weights.
Place the downloaded .safetensors file in:
📂 ComfyUI/
├── 📂 models/
│   └── 📂 text_encoders/
│       └── gemma4_e4b_it_fp8_scaled.safetensors