Hugging Face Blog·2026年3月5日 09:00·約2分

モジュラーディフューザーの紹介 - 拡散パイプラインのための構成可能なビルディングブロック

#Diffusion Model #Hugging Face #FLUX #Software Architecture #Generative AI

TL;DR

Hugging Faceは、Diffusion Pipelineを再利用可能なブロック単位で構成できる「Modular Diffusers」を導入し、既存のAPIを維持しつつ柔軟なワークフロー構築とMellonとの統合を実現した。

AI深層分析2026年4月26日 07:59

重要/ 5段階

深度40%

キーポイント

モジュール型パイプラインの導入

Diffusion Pipelineをテキストエンコーダー、VAE、ノイズ除去などの独立したブロックに分割し、これらを組み合わせてカスタムワークフローを構築する新しいアーキテクチャを提供した。

既存APIとの互換性と柔軟性

基本的な推論APIは従来のDiffusionPipelineと同じだが、ブロックの追加・削除・交換が可能であり、特定のコンポーネントのみを独立して実行することもできる。

Mellonとのビジュアル統合

ノードベースのワークフローインターフェース「Mellon」と統合されており、視覚的にブロックを接続して複雑な生成パイプラインを設計・実行できる。

FLUX.2 Klein 4Bでの実証

FLUX.2 Klein 4Bモデルを用いた具体例を示し、テキストエンコーダーを分離して埋めベクトルを取得するなど、高度なカスタマイズが可能であることを実証した。

カスタムブロックの定義と再利用

カスタムブロックはPythonクラスとして定義され、コンポーネント、入出力、計算ロジックを指定することで、任意のワークフローにプラグインして再利用可能。

ブロックの組み込みによるワークフロー構築

既存のブロック（例：ControlNet）から特定のワークフローを抽出し、新しいカスタムブロック（例：DepthProcessorBlock）を先頭に挿入することで、複雑なパイプラインを構成できる。

ワークフローの動的なブロック構成

ModularPipelineはtext2imageやControlNetなどのワークフローを定義し、DepthProcessorBlockのようなカスタムブロックをシーケンスに挿入することで、出力が自動的に次のブロックの入力として連携する柔軟なパイプライン構築が可能。

影響分析・編集コメントを表示

影響分析

この発表は、Stable DiffusionやFLUXなどの主要な画像生成モデルの使用方法に根本的な変化をもたらす可能性がある。開発者は単一のモデルを呼び出すだけでなく、生成プロセスの各段階（エンコーディング、ノイズ除去、デコードなど）を細かく制御・最適化できるようになる。これにより、計算リソースの節約や特定タスクへの最適化など、実務レベルでの応用範囲が拡大する。

編集コメント

画像生成AIの実装において、モデル全体を再学習させることなくインフラ層の最適化やパイプラインのカスタマイズが可能になる点は、エンジニアにとって大きな福音である。特に大規模モデルの推論コスト削減や特殊な生成要件への対応において、このモジュール化アプローチが標準となれば、業界全体の開発効率に寄与するだろう。

ダイナミックノード — モデル固有の数十のノードの代わりに、選択したモデルに基づいてインターフェースを自動的に適応させる少数のノードセットを用意しました。一度覚えれば、あらゆるモデルで使用できます。

シングルノードワークフロー — Modular Diffusersの構成可能なブロックシステムのおかげで、パイプライン全体を単一のノードに統合できます。同じキャンバス上で複数のワークフローを実行しても、煩雑になりません。

すぐに使えるHub統合 — Hugging Face Hubに公開されたカスタムブロックは、Mellonで即座に動作します。ブロック定義からノードインターフェースを自動生成するユーティリティ関数を提供しており、UIコードは不要です。

この統合が可能なのは、すべてのブロックが同じプロパティ（inputs、intermediate_outputs、expected_componentsなど）を公開しているからです。

例えば、diffusers/FLUX.2-klein-4B-modular リポジトリには、パイプライン定義、コンポーネント参照、および mellon_pipeline_config.json が含まれています。ModularPipeline.from_pretrained("diffusers/FLUX.2-klein-4B-modular") を実行するだけで読み込めます。

簡単な例をご紹介します。既存の「テキストから画像へ」のワークフローに、Geminiプロンプト拡張ノード（diffusers/gemini-prompt-expander-mellon でモジュールリポジトリとしてホストされています）を追加します：

Dynamic Blockノードをドラッグし、repo_id に diffusers/gemini-prompt-expander-mellon と入力します。
LOAD CUSTOM BLOCKをクリック — ノードは自動的にプロンプト入力用のテキストボックスと「prompt」という名前の出力ソケットを生成します。これらはすべてリポジトリから設定されます。
短いプロンプトを入力し、出力をEncode Promptノードに接続して実行します。

Geminiは画像生成前に短いプロンプトを詳細な説明に拡張します。コードも設定も不要 — HubリポジトリIDだけです。

これは一例に過ぎません。詳細なチュートリアルについては、Mellon x Modular Diffusersガイドをご覧ください。

Modular Diffusersは、Diffusersを強力にしている機能を損なうことなく、コミュニティが求めてきた構成可能性と柔軟性をもたらします。まだ初期段階です — 今後の方向性を形作るために、皆様の意見をお待ちしています。ぜひお試しいただき、何が機能し、何が機能せず、何が不足しているかをお聞かせください。

Modular Diffusersの概要
Mellon x Modular Diffusers
カスタムブロックのコレクション
Modular Diffusersを使用したコミュニティパイプラインのコレクション

サムネイルを提供してくれたChun Te Lee、そして丁寧なレビューをしてくれたPoli、Pedro、Lysandre、Linoy、Aritra、Stevenに感謝します。

原文を表示

Back to Articles Introducing Modular Diffusers - Composable Building Blocks for Diffusion Pipelines

Upvote 3

Modular Diffusers introduces a new way to build diffusion pipelines by composing reusable blocks. Instead of writing entire pipelines from scratch, you can mix and match blocks to create workflows tailored to your needs! This complements the existing DiffusionPipeline

In this post, we'll walk through how Modular Diffusers works — from the familiar API to run a modular pipeline, to building fully custom blocks and composing them into your own workflow. We'll also show how it integrates with Mellon, a node-based visual workflow interface that you can use to wire Modular Diffusers blocks together.

Table of contents

Modular Repositories

Community Pipelines

Integration with Mellon

Here is a simple example of how to run inference with FLUX.2 Klein 4B

import torch from diffusers import ModularPipeline # Create a modular pipeline - this only defines the workflow, model weights have not been loaded yet pipe = ModularPipeline.from_pretrained( "black-forest-labs/FLUX.2-klein-4B" ) # Now load the model weights — configure dtype, quantization, etc in this step pipe.load_components(torch_dtype=torch.bfloat16) pipe.to("cuda") # Generate an image - API remains the same as DiffusionPipeline image = pipe( prompt="a serene landscape at sunset", num_inference_steps=4, ).images[0] image.save("output.png")

You get the same results as with a standard DiffusionPipeline

print(pipe.blocks)

Flux2KleinAutoBlocks( ... Sub-Blocks: [0] text_encoder (Flux2KleinTextEncoderStep) [1] vae_encoder (Flux2KleinAutoVaeEncoderStep) [2] denoise (Flux2KleinCoreDenoiseStep) [3] decode (Flux2DecodeStep) )

Each block is self-contained with its own inputs and outputs. You can run any block independently as its own pipeline, or add, remove, and swap blocks freely — they dynamically recompose to work with whatever blocks remain. Use .init_pipeline()

.load_components()

get a copy of the blocks blocks = pipe.blocks # pop out the text_encoder block text_blocks = blocks.sub_blocks.pop("text_encoder") # run it as its own pipeline text_pipe = text_blocks.init_pipeline("black-forest-labs/FLUX.2-klein-4B") # load the text_encoder, or reuse already loaded components: text_pipe.update_components(text_encoder=pipe.text_encoder) text_pipe.load_components(torch_dtype=torch.bfloat16) text_pipe.to("cuda") prompt_embeds = text_pipe(prompt="a serene landscape at sunset").prompt_embeds # create a new pipeline from the remaining blocks # it now accepts prompt_embeds directly instead of prompt remaining_pipe = blocks.init_pipeline("black-forest-labs/FLUX.2-klein-4B") remaining_pipe.load_components(torch_dtype=torch.bfloat16) remaining_pipe.to("cuda") image = remaining_pipe(prompt_embeds=prompt_embeds, num_inference_steps=4).images[0]

For more on block types, composition patterns, lazy loading, and memory management with ComponentsManager

Modular Diffusers really shines when creating your own blocks. A custom block is a Python class that defines its components, inputs, outputs, and computation logic — and once defined, you can plug it into any workflow.

Writing a Custom Block

Here's an example block that extracts depth maps from images using Depth Anything V2.

class DepthProcessorBlock(ModularPipelineBlocks): @property def expected_components(self): return [ ComponentSpec("depth_processor", DepthPreprocessor, pretrained_model_name_or_path="depth-anything/Depth-Anything-V2-Large-hf") ] @property def inputs(self): return [ InputParam("image", required=True, description="Image(s) to extract depth maps from"), ] @property def intermediate_outputs(self): return [ OutputParam("control_image", type_hint=torch.Tensor, description="Depth map(s) of input image(s)"), ] @torch.no_grad() def __call__(self, components, state): block_state = self.get_block_state(state) depth_map = components.depth_processor(block_state.image) block_state.control_image = depth_map.to(block_state.device) self.set_block_state(state, block_state) return components, state

expected_components

pretrained_model_name_or_path

load_components

modular_model_index.json

intermediate_outputs

Composing Blocks into Workflows

Let's use this block with Qwen's ControlNet workflow. Extract the ControlNet workflow and insert the depth block at the beginning:

Create Qwen Image pipeline pipe = ModularPipeline.from_pretrained("Qwen/Qwen-Image") print(pipe.blocks.available_workflows) # Supported workflows: # - `text2image`: requires `prompt` # - `image2image`: requires `prompt`, `image` # - `inpainting`: requires `prompt`, `mask_image`, `image` # - `controlnet_text2image`: requires `prompt`, `control_image` # - `controlnet_image2image`: requires `prompt`, `image`, `control_image` # Extract the ControlNet workflow — it expects a control_image input blocks = pipe.blocks.get_workflow("controlnet_text2image") # Show the blocks this workflow uses print(blocks) # Insert depth block at the beginning — its output (control_image) # automatically flows to the ControlNet block that needs it blocks.sub_blocks.insert("depth", DepthProcessorBlock(), 0) # You can inspect any block's inputs and outputs with print(blocks.doc) blocks.sub_blocks['depth'].doc

Blocks in a sequence share data automatically: the depth block's control_image

from diffusers import ComponentsManager, AutoModel from diffusers.utils import load_image # ComponentsManager handles memory across multiple pipelines — # it automatically offloads models to CPU when not in use manager = ComponentsManager() pipeline = blocks.init_pipeline("Qwen/Qwen-Image", components_manager=manager) pipeline.load_components(torch_dtype=torch.bfloat16) # The depth model loads automatically from the default path we set in expected_components — # no need to load it manually even though it's not part of the Qwen repo. # But controlnet is not included by default, so we do need to load it from a different repo controlnet = AutoModel.from_pretrained("InstantX/Qwen-Image-ControlNet-Union", torch_dtype=torch.bfloat16) pipeline.update_components(controlnet=controlnet) # pipeline now takes image as input image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/astronaut.jpg") output = pipeline( prompt="an astronaut hatching from an egg, detailed, fantasy, Pixar, Disney", image=image, ).images[0]

Sharing Custom Blocks on the Hub

You can publish your custom block to the Hub so anyone can load it with trust_remote_code=True

pipeline.save_pretrained(local_dir, repo_id="your-username/your-block-name", push_to_hub=True)

The DepthProcessorBlock

from diffusers import ModularPipelineBlocks depth_block = ModularPipelineBlocks.from_pretrained( "diffusers/depth-processor-custom-block", trust_remote_code=True )

We've published a collection of ready-to-use custom blocks here.

Modular Repositories

ModularPipeline.from_pretrained

A modular repository is able to reference components from their original model repos. For example, diffusers/flux2-bnb-4bit-modular contains a quantized transformer and loads the remaining components from the original repo.

// diffusers/flux2-bnb-4bit-modular/modular_model_index.json { "transformer": [ "diffusers", "Flux2Transformer2DModel", { "pretrained_model_name_or_path": "diffusers/flux2-bnb-4bit-modular", "subfolder": "transformer", "type_hint": ["diffusers", "Flux2Transformer2DModel"] } ], "vae": [ "diffusers", "AutoencoderKLFlux2", { "pretrained_model_name_or_path": "black-forest-labs/FLUX.2-dev", "subfolder": "vae", "type_hint": ["diffusers", "AutoencoderKLFlux2"] } ], ... }

Modular repositories can also host custom pipeline blocks as Python code and visual UI configurations for tools like Mellon — all in one place.

Community Pipelines

The community has already started building complete pipelines with Modular Diffusers and publishing them on the Hub, with model weights and ready-to-run code.

Krea Realtime Video — A 14B parameter real-time video generation model distilled from Wan 2.1, achieving 11fps on a single B200 GPU. It supports text-to-video, video-to-video, and streaming video-to-video — all built as modular blocks. Users can modify prompts mid-generation, restyle videos on-the-fly, and see first frames within 1 second.

import torch from diffusers import ModularPipeline pipe = ModularPipeline.from_pretrained("krea/krea-realtime-video", trust_remote_code=True) pipe.load_components( trust_remote_code=True, device_map="cuda", torch_dtype={"default": torch.bfloat16, "vae": torch.float16} )

Waypoint-1 — A 2.3B parameter real-time diffusion world model from Overworld. It autoregressively generates interactive worlds from control inputs and text prompts — you can explore and interact with generated environments in real time on consumer hardware.

Teams can build novel architectures, package them as blocks, and publish the entire pipeline on the Hub for anyone to use with ModularPipeline.from_pretrained

Check out the full collection of community pipelines for more.

Integration with Mellon

💡 Mellon is in early development and not ready for production use yet. Consider this a sneak peek of how the integration works!

Mellon is a visual workflow interface integrated with Modular Diffusers. If you're familiar with node-based tools like ComfyUI, you'll feel right at home — but there are some key differences:

Dynamic nodes — Instead of dozens of model-specific nodes, we have a small set of nodes that automatically adapt their interface based on the model you select. Learn them once, use them with any model.

Single-node workflows — Thanks to Modular Diffusers' composable block system, you can collapse an entire pipeline into a single node. Run multiple workflows on the same canvas without the clutter.

Hub integration out of the box — Custom blocks published to the Hugging Face Hub work instantly in Mellon. We provide a utility function to automatically generate the node interface from your block definition — no UI code required.

This integration is possible because every block exposes the same properties (inputs

intermediate_outputs

expected_components

For example, diffusers/FLUX.2-klein-4B-modular contains a pipeline definition, component references, and a mellon_pipeline_config.json

ModularPipeline.from_pretrained("diffusers/FLUX.2-klein-4B-modular")

Here's a quick example. We add a Gemini prompt expansion node — hosted as a modular repo at diffusers/gemini-prompt-expander-mellon — to an existing text-to-image workflow:

Drag in a Dynamic Block node and enter the repo_id

diffusers/gemini-prompt-expander-mellon

Click LOAD CUSTOM BLOCK — the node automatically grows a textbox for your prompt input and an output socket named "prompt", all configured from the repo

Type a short prompt, connect the output to the Encode Prompt node, and run

Gemini expands your short prompt into a detailed description before generating the image. No code, no configuration — just a Hub repo id.

This is just one example. For a detailed walkthrough, check out the Mellon x Modular Diffusers guide.

Modular Diffusers brings the composability and flexibility the community has been asking for, without compromising the features that make Diffusers powerful. It's still early — we want your input to shape what comes next. Give it a try and tell us what works, what doesn't, and what's missing.

Overview of Modular Diffusers

Mellon x Modular Diffusers

Collection of custom blocks

Collection of community pipelines with Modular Diffusers

Thanks to Chun Te Lee for the thumbnail, and to Poli, Pedro, Lysandre, Linoy, Aritra, and Steven for their thoughtful reviews.

この記事をシェア

TechCrunch AI2026年7月5日 05:55

Google、AI を活用して独立宣言書を作成する商業広告を発表

Simon Willison Blog重要度42026年7月4日 06:25

Josh W. Comeau が AI をオンラインコース販売の減少要因と指摘

The Verge AI重要度42026年7月3日 22:56

Anthropic、自社製薬の開発を計画

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む

Hugging Face Blog·2026年3月5日 09:00·約2分

モジュラーディフューザーの紹介 - 拡散パイプラインのための構成可能なビルディングブロック

#Diffusion Model #Hugging Face #FLUX #Software Architecture #Generative AI

TL;DR

AI深層分析2026年4月26日 07:59

重要/ 5段階

深度40%

キーポイント

モジュール型パイプラインの導入

既存APIとの互換性と柔軟性

Mellonとのビジュアル統合

FLUX.2 Klein 4Bでの実証

カスタムブロックの定義と再利用

ブロックの組み込みによるワークフロー構築

ワークフローの動的なブロック構成

影響分析・編集コメントを表示

影響分析

編集コメント

この統合が可能なのは、すべてのブロックが同じプロパティ（inputs、intermediate_outputs、expected_componentsなど）を公開しているからです。

Dynamic Blockノードをドラッグし、repo_id に diffusers/gemini-prompt-expander-mellon と入力します。
LOAD CUSTOM BLOCKをクリック — ノードは自動的にプロンプト入力用のテキストボックスと「prompt」という名前の出力ソケットを生成します。これらはすべてリポジトリから設定されます。
短いプロンプトを入力し、出力をEncode Promptノードに接続して実行します。

Geminiは画像生成前に短いプロンプトを詳細な説明に拡張します。コードも設定も不要 — HubリポジトリIDだけです。

これは一例に過ぎません。詳細なチュートリアルについては、Mellon x Modular Diffusersガイドをご覧ください。

Modular Diffusersの概要
Mellon x Modular Diffusers
カスタムブロックのコレクション
Modular Diffusersを使用したコミュニティパイプラインのコレクション

サムネイルを提供してくれたChun Te Lee、そして丁寧なレビューをしてくれたPoli、Pedro、Lysandre、Linoy、Aritra、Stevenに感謝します。

原文を表示

Back to Articles Introducing Modular Diffusers - Composable Building Blocks for Diffusion Pipelines

Upvote 3

Table of contents

Modular Repositories

Community Pipelines

Integration with Mellon

Here is a simple example of how to run inference with FLUX.2 Klein 4B

You get the same results as with a standard DiffusionPipeline

print(pipe.blocks)

.load_components()

get a copy of the blocks blocks = pipe.blocks # pop out the text_encoder block text_blocks = blocks.sub_blocks.pop("text_encoder") # run it as its own pipeline text_pipe = text_blocks.init_pipeline("black-forest-labs/FLUX.2-klein-4B") # load the text_encoder, or reuse already loaded components: text_pipe.update_components(text_encoder=pipe.text_encoder) text_pipe.load_components(torch_dtype=torch.bfloat16) text_pipe.to("cuda") prompt_embeds = text_pipe(prompt="a serene landscape at sunset").prompt_embeds # create a new pipeline from the remaining blocks # it now accepts prompt_embeds directly instead of prompt remaining_pipe = blocks.init_pipeline("black-forest-labs/FLUX.2-klein-4B") remaining_pipe.load_components(torch_dtype=torch.bfloat16) remaining_pipe.to("cuda") image = remaining_pipe(prompt_embeds=prompt_embeds, num_inference_steps=4).images[0]

For more on block types, composition patterns, lazy loading, and memory management with ComponentsManager

Writing a Custom Block

Here's an example block that extracts depth maps from images using Depth Anything V2.

expected_components

pretrained_model_name_or_path

load_components

modular_model_index.json

intermediate_outputs

Composing Blocks into Workflows

Let's use this block with Qwen's ControlNet workflow. Extract the ControlNet workflow and insert the depth block at the beginning:

Create Qwen Image pipeline pipe = ModularPipeline.from_pretrained("Qwen/Qwen-Image") print(pipe.blocks.available_workflows) # Supported workflows: # - `text2image`: requires `prompt` # - `image2image`: requires `prompt`, `image` # - `inpainting`: requires `prompt`, `mask_image`, `image` # - `controlnet_text2image`: requires `prompt`, `control_image` # - `controlnet_image2image`: requires `prompt`, `image`, `control_image` # Extract the ControlNet workflow — it expects a control_image input blocks = pipe.blocks.get_workflow("controlnet_text2image") # Show the blocks this workflow uses print(blocks) # Insert depth block at the beginning — its output (control_image) # automatically flows to the ControlNet block that needs it blocks.sub_blocks.insert("depth", DepthProcessorBlock(), 0) # You can inspect any block's inputs and outputs with print(blocks.doc) blocks.sub_blocks['depth'].doc

Blocks in a sequence share data automatically: the depth block's control_image

Sharing Custom Blocks on the Hub

You can publish your custom block to the Hub so anyone can load it with trust_remote_code=True

pipeline.save_pretrained(local_dir, repo_id="your-username/your-block-name", push_to_hub=True)

The DepthProcessorBlock

from diffusers import ModularPipelineBlocks depth_block = ModularPipelineBlocks.from_pretrained( "diffusers/depth-processor-custom-block", trust_remote_code=True )

We've published a collection of ready-to-use custom blocks here.

Modular Repositories

ModularPipeline.from_pretrained

Modular repositories can also host custom pipeline blocks as Python code and visual UI configurations for tools like Mellon — all in one place.

Community Pipelines

The community has already started building complete pipelines with Modular Diffusers and publishing them on the Hub, with model weights and ready-to-run code.

Teams can build novel architectures, package them as blocks, and publish the entire pipeline on the Hub for anyone to use with ModularPipeline.from_pretrained

Check out the full collection of community pipelines for more.

Integration with Mellon

💡 Mellon is in early development and not ready for production use yet. Consider this a sneak peek of how the integration works!

Mellon is a visual workflow interface integrated with Modular Diffusers. If you're familiar with node-based tools like ComfyUI, you'll feel right at home — but there are some key differences:

Single-node workflows — Thanks to Modular Diffusers' composable block system, you can collapse an entire pipeline into a single node. Run multiple workflows on the same canvas without the clutter.

This integration is possible because every block exposes the same properties (inputs

intermediate_outputs

expected_components

For example, diffusers/FLUX.2-klein-4B-modular contains a pipeline definition, component references, and a mellon_pipeline_config.json

ModularPipeline.from_pretrained("diffusers/FLUX.2-klein-4B-modular")

Here's a quick example. We add a Gemini prompt expansion node — hosted as a modular repo at diffusers/gemini-prompt-expander-mellon — to an existing text-to-image workflow:

Drag in a Dynamic Block node and enter the repo_id

diffusers/gemini-prompt-expander-mellon

Click LOAD CUSTOM BLOCK — the node automatically grows a textbox for your prompt input and an output socket named "prompt", all configured from the repo

Type a short prompt, connect the output to the Encode Prompt node, and run

Gemini expands your short prompt into a detailed description before generating the image. No code, no configuration — just a Hub repo id.

This is just one example. For a detailed walkthrough, check out the Mellon x Modular Diffusers guide.

Overview of Modular Diffusers

Mellon x Modular Diffusers

Collection of custom blocks

Collection of community pipelines with Modular Diffusers

Thanks to Chun Te Lee for the thumbnail, and to Poli, Pedro, Lysandre, Linoy, Aritra, and Steven for their thoughtful reviews.