メインコンテンツへスキップ
ERNIE-Image は百度が開発したオープンなテキストから画像生成モデルで、Apache-2.0 ライセンスで公開されています。8B パラメータの拡散トランスフォーマー(DiT)をベースに構築されており、精密なテキストレンダリング、高い命令追従性、構造化された視覚生成を実現する高品質な画像生成が可能です。 このモデルには、短い入力をより豊かなプロンプトに拡張してより良い結果をもたらす、内蔵の プロンプトエンハンサー(3B)が含まれています。 モデルの主な特長
  • 精密なテキストレンダリング — 英語、中国語などにおける密度の高いレイアウト対応テキスト
  • 高い命令追従性 — 複雑なプロンプト、複数オブジェクトの関係、知識集約型の説明に対応
  • 構造化された視覚生成 — ポスター、漫画/アニメのストーリーボード、複数パネルの構成
  • 幅広いスタイル対応 — リアルな写真表現から映画的な美学まで
  • コンパクトで展開しやすい — 8B パラメータ、24 GB VRAM で動作
  • 内蔵プロンプトエンハンサー — 短い入力をより豊かなプロンプトに拡張する 3B モデル
関連リンク

ERNIE-Image テキストから画像へのワークフロー

ワークフローのダウンロード

ERNIE-Image テキストから画像へのワークフロー JSON ファイルをダウンロードします。

Comfy Cloud で実行

このワークフローを Comfy Cloud 上で直接実行します。
ComfyUI が最新版に更新されていることを確認してください。このガイドで紹介するワークフローは、ワークフローテンプレートから入手できます。
テンプレート内に該当のワークフローが見つからない場合、ComfyUI のバージョンが古くなっている可能性があります。(デスクトップ版の更新は若干遅れることがあります)
ワークフローを読み込んだ際にノードが欠落している場合の主な原因:
  1. 最新の ComfyUI(Nightly 版)を使用していない
  2. 起動時に一部のノードのインポートに失敗している

はじめに

  1. ComfyUI を最新バージョンに更新するか、Comfy Cloud を使用してください
  2. テンプレート に移動し、ERNIE-Image を検索します
  3. ERNIE-Image ワークフローを選択します
  4. 不足しているモデルをダウンロードし、プロンプトを更新して 実行 をクリックします

ERNIE-Image モデルのダウンロード

リパッケージされたすべてのモデルファイルは、Hugging Face の Comfy-Org/ERNIE-Image で入手できます。

ernie-image.safetensors

ERNIE-Image 用拡散モデル。

ministral-3-3b.safetensors

ERNIE-Image 用テキストエンコーダー。

ernie-image-prompt-enhancer.safetensors

ERNIE-Image 用プロンプトエンハンサーテキストエンコーダー。

flux2-vae.safetensors

ERNIE-Image 用 VAE。
モデルの保存場所
📂 ComfyUI/
├── 📂 models/
│   ├── 📂 diffusion_models/
│   │   └── ernie-image.safetensors
│   ├── 📂 text_encoders/
│   │   ├── ministral-3-3b.safetensors
│   │   └── ernie-image-prompt-enhancer.safetensors
│   └── 📂 vae/
│       └── flux2-vae.safetensors

ERNIE-Image-Turbo

ERNIE-Image-Turbo は DMD と RL で最適化された高速バリアントで、標準モデルに必要な約 50 ステップに対し、わずか 8 ステップ で画像を生成します。

ワークフローのダウンロード

ERNIE-Image-Turbo テキストから画像へのワークフロー JSON ファイルをダウンロードします。

Comfy Cloud で実行

このワークフローを Comfy Cloud 上で直接実行します。

ERNIE-Image-Turbo モデルのダウンロード

ernie-image-turbo.safetensors

ERNIE-Image-Turbo 用拡散モデル。

ministral-3-3b.safetensors

ERNIE-Image-Turbo 用テキストエンコーダー。

ernie-image-prompt-enhancer.safetensors

ERNIE-Image-Turbo 用プロンプトエンハンサーテキストエンコーダー。

flux2-vae.safetensors

ERNIE-Image-Turbo 用 VAE。
モデルの保存場所
📂 ComfyUI/
├── 📂 models/
│   ├── 📂 diffusion_models/
│   │   └── ernie-image-turbo.safetensors
│   ├── 📂 text_encoders/
│   │   ├── ministral-3-3b.safetensors
│   │   └── ernie-image-prompt-enhancer.safetensors
│   └── 📂 vae/
│       └── flux2-vae.safetensors

利用可能なモデル

モデル説明推論ステップ数リンク
ERNIE-Imageメイン SFT モデル — 高品質な生成と命令追従性約 50Hugging Face
ERNIE-Image-TurboDMD と RL で最適化されたターボモデル — 高速生成8Hugging Face

サンプル例

テキストレンダリングとデザインレイアウト

コーヒー製造プロセスのインフォグラフィック
Educational comic book style infographic showing the coffee making process. The background has a light beige vintage paper texture. At the top center of the image is a bold brown title that reads 'The Coffee Making Process', with a smaller English subtitle 'How Coffee is Made' below. The main part consists of six step blocks connected by brown dotted arrows, arranged in two rows up and down, forming a Z-shaped visual guide.

The first step in the upper left corner illustrates a coffee tree full of ripe red coffee cherries, with a cut coffee cherry next to it showing the beans inside, labeled 'Step 1: Harvesting Cherries' below the block.
The second step in the upper middle shows wooden fermentation boxes filled with coffee beans, labeled 'Step 2: Pulping and Fermentation'.
The third step in the upper right depicts coffee beans being dried under the sun on bamboo mats, labeled 'Step 3: Sun Drying'.
The fourth step in the lower left shows a vintage metal roasting machine with coffee beans rolling inside and steam rising, labeled 'Step 4: Roasting'.
The fifth step in the lower middle features a stone grinder pouring out smooth coffee grounds, labeled 'Step 5: Grinding'.
The sixth step in the lower right shows a production line where coffee liquid is being poured into molds, with finished packaged coffee beside it, labeled 'Step 6: Brewing and Forming'.

The four corners of the image are decorated with hand-drawn coffee leaves and coffee beans. The overall color palette consists of warm brown, caramel, cream, deep red and olive green, with delicate lines and clear layout.
日本の浮世絵スタイルの語学フラッシュカード
Vertical rectangular language learning flashcard in traditional Japanese Ukiyo-e woodblock style. Whole card framed by dark blue traditional Seigaiha wave borders, with aged beige rough washi paper texture background.

Upper two-thirds: Ukiyo-e illustration. A graceful spotted sika deer stands calmly in the center, with delicate black outlines and traditional flat coloring. Background: classic Japanese scenery — snow-capped Mount Fuji in distance, stylized layered clouds, falling pink cherry blossom petals. A red square antique seal with the Chinese character '鹿' in seal script at the lower right. Color scheme: indigo, vermilion, moss green, beige, with woodblock color misalignment and vintage charm.

Lower one-third: centered text area. Top: small hiragana しか; large Japanese kanji 鹿; below it romaji shika; then English "Deer". Bottom: bilingual example — Japanese: 鹿が桜の花を見ている。English: The deer is looking at the cherry blossoms. All black text with slight brush calligraphy texture, matching Ukiyo-e antique style.
スプリットスクリーンのコンセプチュアルポスター
Split-screen conceptual poster with vertical split composition, left half representing the present and future featuring a solid stone analog clock dial with bright red hands and red scale markings, set against a deeply cracked, dry, parched earth texture in muted cool gray tones, embodying slow, static structural decay and temporal erosion, right half representing the past that is violently disintegrating, dissolving and exploding into chaotic dust, debris, flying stone shards and swirling cosmic nebula energy with a deep blood red and black color palette, the circular clock frame itself is half solid weathered stone and half crumbling into particles merging the two worlds, with the red clock hands fully positioned on the left present/future side of the dial, embodying the concept of time, past vs future, slow decay vs violent collapse, hyper-realistic 3D render with cinematic dramatic lighting, high contrast, sharp details on the left side, motion blur and dynamic particle effects on the right side, photorealistic textures of cracked earth and floating dust, atmospheric haze, futuristic graphic design with minimalist red vertical UI elements and custom technical text overlays in the corners, bold red stylized title text "ETERNAL DAWN" in the top right corner, small red technical metadata text "SPECDARY BOTHUNG" directly below the main title in the top right, dense lines of small red technical data blocks including "TIMESTAMP: 00:00:00", "COLLAPSE RATE: 99.8%", "TEMPORAL ANOMALY DETECTED", "PAST DIMENSION: DECAYING" in the bottom right corner with a red horizontal accent bar and footer text "FOR DECAY OF THE EONS" below, small red header text "TEMPORAL SHIFT PROTOCOL" with vertical red accent line running down the left edge in the top left, vertical red accent line with small red technical text annotations including "PRESENT: STABLE", "FUTURE: UNFOLDING" along it in the bottom left, red text "WARNING: TEMPORAL DISINTEGRATION" curving around the center clock dial perimeter, small red footer text "PAST FADES, FUTURE RISES" spanning the split at the bottom center, shot in the style of a high-end sci-fi movie poster, 8K ultra-high resolution, photorealistic, cinematic composition, dramatic depth, moody and intense atmosphere, detailed particle simulation, photorealistic material rendering.

映画的・スタイライズされた美学

都市の夜の街並み
First-person vertical real-world urban night photo. Narrow, wet, busy city street with strong depth; towering buildings on both sides with fire escapes, pipes, AC units, warm window lights. Red vertical "HOTEL" neon on left, bright blue "BAR" sign and yellow "OPEN 24 HOURS" box on right. Distant huge digital billboard glowing cyan "NEON CITY". Rough wet asphalt reflects neon lights; vintage yellow taxi with "TAXI" sign drives down center, red taillights on. Two pedestrians in dark coats with black umbrellas walk away on right sidewalk. Giant needle-like steel TV tower peeks through building gaps at end, red aircraft warning light glowing. Cinematic volumetric light, misty air, cool cyan-blue vs warm orange-red contrast, immersive and realistic.
エディトリアルファッション写真
Full-body editorial fashion shot, elegant European woman aged 28-30 with soft natural blonde hair styled in a loose low bun, wearing a tailored soft cream ribbed knit long-sleeve maxi dress with a fitted waist and flowing floor-length skirt, minimalist nude leather mules and delicate gold jewelry, standing leaning against an ornate carved stone balustrade on a gravel terrace in the garden of a grand Lake Como villa, tall iconic Italian cypress trees framing the background, calm still Lake Como water stretching out to hazy misty Alpine mountains in the distance, overcast soft diffused natural light, muted warm earthy color palette, quiet luxury aesthetic, hyper-realistic editorial fashion photography, natural skin texture with subtle pores, soft bokeh foreground foliage, shallow depth of field, shot on a medium format camera with an 85mm f/1.4 lens, sharp focus on the subject, fine film grain, muted contrast, timeless sophisticated mood, shot by Lachlan Bailey in the style of Vogue editorial, 8K ultra-high resolution, photorealistic, cinematic composition following the rule of thirds, golden hour overcast lighting, atmospheric haze, refined Italian villa garden details.
コーヒーを持つアヒルの 3D イラスト
A charming, whimsical 3D illustration featuring a fluffy, plush-textured white duck sitting comfortably in a fuzzy coral-pink armchair, holding a bright red mug of steaming black coffee against a solid warm coral-red background; defined by its tactile, huggable fabric-like texture across all elements, a bold, minimalist warm color palette of creamy white, vibrant orange, and rich reds, gentle anthropomorphism that blends cuteness with relatable cozy relaxation, soft diffused lighting, and a subtle film grain that adds a nostalgic, handcrafted feel, creating a playful yet serene mood perfect for themes of calm downtime and morning routines.

複数パネルの構成

北米在来種のインフォグラフィック
Educational comic book infographic. Five vertical panels side by side, earthy color palette, hand-drawn illustration style. Top title: "NORTH AMERICAN NATIVE SPECIES". Style: comic line art, watercolor fills, white panel backgrounds, bold headers.
Panel 1: Gray squirrel holding an acorn. Header: "EASTERN GRAY SQUIRREL" Fact: "Buries acorns to help forests grow." Arrow → tail: "Bushy tail"

Panel 2: Robin with orange-red breast on a branch. Header: "AMERICAN ROBIN" Fact: "A sign of spring returning." Arrow → breast: "Orange-red breast"

Panel 3: Red maple tree with autumn leaves. Header: "RED MAPLE TREE" Fact: "Shelters wildlife year-round." Arrow → leaves: "Red in autumn"

Panel 4: Deer with white tail raised. Header: "WHITE-TAILED DEER" Fact: "White tail signals danger." Arrow → tail: "Alarm signal"

Panel 5: Monarch butterfly on a flower. Header: "MONARCH BUTTERFLY" Fact: "Migrates 3,000 miles each year." Arrow → wings: "Warning colors"
6 コマ漫画ページ
A 6-panel comic page, 2 columns × 3 rows, black border between each panel.

Panel 1 (top-left): wide shot — a young woman in a red coat stands at a rainy train station. Caption box: "She had waited three years for this moment."
Panel 2 (top-right): close-up on her face — anxious eyes, rain on her cheek. No text.
Panel 3 (mid-left): medium shot — a train arrives, doors slide open, steam rising. Sound effect text: "WHOOOOSH"
Panel 4 (mid-right): her point-of-view — a man in a gray jacket steps out, back to camera.
Panel 5 (bottom-left): extreme close-up — her hand trembling as she reaches forward.
Panel 6 (bottom-right): wide shot — they face each other under one umbrella. Caption box: "Some arrivals change everything."

Ink and watercolor style, cool blue-gray palette, expressive line art. Same character design consistent across all 6 panels.