Ovis-Image is a 7B text-to-image model built upon Ovis-U1, specifically optimized for high-quality text rendering. It delivers text rendering quality comparable to much larger 20B-class systems while remaining compact enough to run on widely accessible hardware.
Model Highlights:
- Strong Text Rendering at 7B Scale: Delivers text rendering quality comparable to much larger 20B-class systems like Qwen-Image and competitive with leading closed-source models like GPT4o in text-centric scenarios
- High Fidelity on Text-Heavy Prompts: Excels on prompts that demand tight alignment between linguistic content and rendered typography (e.g., posters, banners, logos, UI mockups, infographics)
- Accurate Bilingual Text Rendering: Produces legible, correctly spelled, and semantically consistent text in both Chinese and English across diverse fonts, sizes, and aspect ratios
- Efficiency and Deployability: Fits on a single high-end GPU with moderate memory, supports low-latency interactive use
Related Links:
Ovis-Image text-to-image workflow
Download JSON Workflow File
Make sure your ComfyUI is updated.Workflows in this guide can be found in the Workflow Templates.
If you can’t find them in the template, your ComfyUI may be outdated. (Desktop version’s update will delay sometime)If nodes are missing when loading a workflow, possible reasons:
- You are not using the latest ComfyUI version (Nightly version)
- Some nodes failed to import at startup
Model links
text_encoders
diffusion_models
vae
Model Storage Location
📂 ComfyUI/
├── 📂 models/
│ ├── 📂 text_encoders/
│ │ └── ovis_2.5.safetensors
│ ├── 📂 diffusion_models/
│ │ └── ovis_image_bf16.safetensors
│ └── 📂 vae/
│ └── ae.safetensors