Gemma 4 is the latest generation of lightweight open LLMs from Google DeepMind, built for text generation, image understanding, video analysis, audio transcription, and structured tool use. It is natively supported in ComfyUI as the default Text Generation model. Model highlights:Documentation Index
Fetch the complete documentation index at: https://docs.comfy.org/llms.txt
Use this file to discover all available pages before exploring further.
- Multimodal by design — accepts text, image, video, and audio inputs simultaneously
- Three sizes available:
- E2B (2B) — Fast and lightweight, ideal for consumer GPUs
- E4B (4B) — Balanced performance, recommended default
- 31B — Best quality, requires higher VRAM
- Thinking mode — Built-in step-by-step reasoning before generating answers
- Long context — Up to 128K tokens (E2B/E4B) and 256K tokens (31B)
- Multilingual — 35+ languages out of the box, pre-trained on 140+
- Function calling — Native support for structured tool use and agentic workflows
- ComfyUI native — loaded and run through the built-in
TextGenerateandCLIPLoadernodes
Available workflow
Gemma 4: Text Generation
Download Workflow
Download JSON or search “Gemma 4 Text Generation” in Template Library
Run on Comfy Cloud
Open in Comfy Cloud
This workflow demonstrates the core text generation capabilities of Gemma 4. It accepts an optional image, audio file, or video as additional context alongside your text prompt, and generates natural language output — with support for reasoning, coding, and multilingual prompts.
Inputs:
- Text prompt — your question or instruction
- Image (optional) — for visual understanding tasks (OCR, object detection, chart reading, etc.)
- Audio (optional) — for speech recognition or transcription
- Video (optional) — for video understanding across frames (subsampled to 1 FPS internally)
- Max length — maximum number of tokens to generate (default 256)
- Sampling mode — toggle sampling on/off and adjust temperature, top-k, top-p, repetition penalty, and seed
- Thinking mode — enable step-by-step reasoning before the final answer
- Use default template — apply the built-in system prompt for the model
- Generated text — the model’s response as a plain text string
Learn about Subgraph
This workflow uses Subgraph nodes for modular processing. Check out the Subgraph documentation to learn how to customize and extend the workflow.
Model Download
Gemma 4 models are loaded as text encoders in ComfyUI. Download the relevant model file and place it in the correct directory:Gemma 4 2B (E2B IT FP8)
Fast, lightweight. Recommended for consumer GPUs.
Gemma 4 4B (E4B IT FP8)
Balanced performance. The default model in the workflow.
View All Variants
Browse all Gemma 4 model weights.
.safetensors file in: