Hugging Face Blog·2026年4月29日 09:00·約5分

DeepInfra が Hugging Face の推論プロバイダーに正式追加

#LLM #Inference Infrastructure #Hugging Face #DeepInfra #Model Deployment

TL;DR

Hugging Face が DeepInfra を公式推論プロバイダーとして統合し、開発者がモデルをより高速かつ低コストでデプロイできる選択肢が拡大した。

AI深層分析2026年7月5日 10:14

重要/ 5段階

深度40%

キーポイント

DeepInfra の公式パートナーシップ発表

Hugging Face が DeepInfra を「Inference Providers」リストに正式に追加し、両社の技術連携を強化したことを発表した。

推論パフォーマンスとコストの最適化

DeepInfra の高性能インフラを活用することで、ユーザーは従来のホスティングよりも高速なレスポンスと柔軟な価格設定を実現できる。

開発者体験（DX）の向上

Hugging Face のエコシステム内でワンクリックでプロバイダーを切り替えられるようになり、モデルデプロイのハードルが低下した。

DeepInfra の Hugging Face Hub 統合開始

DeepInfra が Hugging Face のサードパーティ推論プロバイダーとして正式にサポートされ、サーバーレス推論の選択肢が拡大しました。

コスト効率と多様なモデル対応

業界最安水準のトークン単価を提供し、LLM からテキスト生成・画像生成まで幅広いタスクに対応しています（現在は会話・テキスト生成が利用可能）。

柔軟な利用モードと SDK 統合

ユーザー独自の API キー使用または HF アカウント経由のルーティング選択が可能で、Python や JavaScript の SDK を介してシームレスに利用できます。

Hugging Face Inference Providers を OpenAI クライアントとして使用可能

base_url に Hugging Face のルーター URL (https://router.huggingface.co/v1) を指定し、HF_TOKEN を API キーとして利用することで、既存の OpenAI 互換ライブラリで DeepInfra モデルにアクセスできます。

影響分析・編集コメントを表示

影響分析

この記事は、Hugging Face が単なるモデルホストから、多様なインフラプロバイダーを統合したプラットフォームへと進化していることを示しています。DeepInfra のような高性能・低価格なプロバイダーを公式にサポートすることで、開発者は自社のユースケースに合わせて最適な推論環境を柔軟に選定できるようになり、AI アプリケーションの普及とコスト効率化に直接的な貢献を果たします。

編集コメント

Hugging Face のエコシステム拡大は、開発者が特定のベンダーに縛られずに最適な推論リソースを選べるようになり、AI アプリケーションの実装コストを劇的に下げる重要な一歩です。

記事一覧に戻る

ウェブサイト UI での仕組み
クライアント SDK を通じて
請求について
フィードバックと今後のステップ

DeepInfraが、Hugging Face Hub でサポートされる推論プロバイダー（Inference Provider）として加わったことを大変嬉しくお伝えします。

DeepInfra は、当社の成長するエコシステムに参加し、Hub のモデルページ上で直接提供されるサーバーレス推論の範囲と機能を強化しました。推論プロバイダーは、JS および Python 用のクライアント SDK（SDK）にもシームレスに統合されており、お好みのプロバイダーを使って多様なモデルを簡単に利用できるようになりました。

DeepInfra は、業界で最もコストパフォーマンスに優れたトークンあたりの価格設定を提供するサーバーレス AI 推論プラットフォームです。100 を超えるモデルカタログを有する DeepInfra では、最小限の設定で開発者が幅広い AI 機能をアプリケーションに統合することが容易になります。

DeepInfra は、LLM からテキストから画像・動画への生成、埋め込みベクトルなどまで、多様なモデルタイプをサポートしています。今回の初期統合の一環として、DeepInfra は Hugging Face 上で会話型およびテキスト生成タスクのサポートを開始し、DeepSeek V4、Kimi-K2.6、GLM-5.1 など、人気のあるオープンウェイト LLM（大規模言語モデル）へのアクセスを可能にします。追加タスク（テキストから画像・動画への生成、埋め込みベクトルなど）のサポートもまもなく展開されます！

DeepInfra を推論プロバイダーとして利用する方法については、専用のドキュメントページをご覧ください。

DeepInfra がサポートするモデルの全リストはこちらでご確認ください。

Hugging Face 上の DeepInfra をフォロー：https://huggingface.co/DeepInfra.

仕組みについて

ウェブサイト UI において

ユーザーアカウント設定では、以下の操作が可能です:

登録済みのプロバイダーに対して独自の API キーを設定できます。カスタムキーが設定されていない場合、リクエストは HF（Hugging Face）を経由してルーティングされます。

プロバイダーの優先順位を指定できます。これはモデルページにあるウィジェットやコードスニペットにも適用されます。

image

前述の通り、Inference Providers（推論プロバイダー）を呼び出す際には 2 つのモードがあります:

カスタムキー（カスタム API キーを使用し、呼び出しは対応する推論プロバイダーに直接行われます）

HF 経由でルーティングされる場合（この場合、プロバイダーからのトークンは不要であり、請求はプロバイダーのアカウントではなく、直接あなたの HF アカウントに対して行われます）

image

モデルページでは、サードパーティ製の推論プロバイダー（現在のモデルと互換性があり、ユーザーの好みに基づいてソートされたもの）が紹介されます。

image

クライアント SDK から

DeepInfra は、Python 用の huggingface_hub (>= 1.11.2) および JavaScript 用の @huggingface/inference を通じて Hugging Face SDK で利用可能です。

以下の例は、DeepSeek V4 Pro を DeepInfra を経由して使用する方法を示しています。認証には Hugging Face トークンを使用してください - リクエストは自動的に DeepInfra へルーティングされます。

お気に入りのエージェントハーンチから

Hugging Face Inference Providers は、Pi、OpenCode、Hermes Agents、OpenClaw など、ほとんどの Agent Harness に統合されています。つまり、追加の接着コード（glue code）なしで、DeepInfra でホストされたモデルを直接お気に入りのツールに組み込むことができます。完全な統合リストはこちらからご覧ください。

from Python

python

import os
from openai import OpenAI

client = OpenAI(
    base_url="https://router.huggingface.co/v1",
    api_key=os.environ["HF_TOKEN"],
)

completion = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-V4-Pro:deepinfra",
    messages=[
        {
            "role": "user",
            "content": "Write a Python function that returns the nth Fibonacci number using memoization."
        }
    ],
)

print(completion.choices[0].message)

from JS

javascript

import { OpenAI } from "openai";

const client = new OpenAI({
    baseURL: "https://router.huggingface.co/v1",
    apiKey: process.env.HF_TOKEN,
});

const chatCompletion = await client.chat.completions.create({
    model: "deepseek-ai/DeepSeek-V4-Pro:deepinfra",
    messages: [
        {
            role: "user",
            content: "Write a Python function that returns the nth Fibonacci number using memoization.",
        },
    ],
});

console.log(chatCompletion.choices[0].message);

Billing

直接リクエストの場合、つまり推論プロバイダーからキーを使用する場合は、対応するプロバイダーから請求されます。例えば、DeepInfra API キーを使用する場合、DeepInfra アカウントに課金されます。

ルーティングされたリクエストの場合、つまり Hugging Face Hub を経由して認証を行う場合は、標準的なプロバイダー API レートのみを支払います。当社からの追加マージンはなく、プロバイダーのコストをそのまま転送するだけです。（将来的には、プロバイダーパートナーとの収益分配契約を結ぶ可能性があります。）

重要なお知らせ ‼️ PRO ユーザーは毎月$2 相当の推論クレジットを受け取れます。これらのクレジットはすべてのプロバイダーで利用可能です。🔥

Hugging Face PRO プランにサブスクライブして、推論クレジット、ZeroGPU、Spaces Dev Mode、20 倍の制限向上などへのアクセスを取得してください。

また、サインインした無料ユーザー向けに少量のクォータ付きで無料で推論サービスも提供していますが、可能であれば PRO へアップグレードすることをお勧めします！

フィードバックと今後のステップ

あなたのフィードバックをぜひ聞きたいです！考えやコメントはこちらで共有してください：https://huggingface.co/spaces/huggingface/HuggingDiscussions/discussions/49

原文を表示

Back to Articles

How it works In the website UI
From the client SDKs

Billing
Feedback and next steps

We're thrilled to share that DeepInfra is now a supported Inference Provider on the Hugging Face Hub!

DeepInfra joins our growing ecosystem, enhancing the breadth and capabilities of serverless inference directly on the Hub's model pages. Inference Providers are also seamlessly integrated into our client SDKs (for both JS and Python), making it super easy to use a wide variety of models with your preferred providers.

DeepInfra is a serverless AI inference platform offering one of the most cost-effective pricing per token in the industry. With a catalog of over 100 models, DeepInfra makes it easy for developers to integrate a wide range of AI capabilities into their applications with minimal setup.

DeepInfra supports a broad spectrum of model types - from LLMs to text-to-image, text-to-video, embeddings, and more. As part of this initial integration, DeepInfra is launching support for conversational and text-generation tasks on Hugging Face, enabling access to popular open-weight LLMs such as DeepSeek V4, Kimi-K2.6, GLM-5.1, and many more. Support for additional tasks (text-to-image, text-to-video, embeddings, and more) will roll out soon!

Read more about how to use DeepInfra as an Inference Provider in its dedicated documentation page.

See the full list of models supported by DeepInfra here.

Follow DeepInfra on Hugging Face: https://huggingface.co/DeepInfra.

How it works

In the website UI

In your user account settings, you are able to:

Set your own API keys for the providers you've signed up with. If no custom key is set, your requests will be routed through HF.

Order providers by preference. This applies to the widget and code snippets in the model pages.

As mentioned, there are two modes when calling Inference Providers:

Custom key (calls go directly to the inference provider, using your own API key of the corresponding inference provider)

Routed by HF (in that case, you don't need a token from the provider, and the charges are applied directly to your HF account rather than the provider's account)

Model pages showcase third-party inference providers (the ones that are compatible with the current model, sorted by user preference)

From the client SDKs

DeepInfra is available through the Hugging Face SDKs - huggingface_hub (>= 1.11.2) for Python and @huggingface/inference for JavaScript.

The following examples show how to use DeepSeek V4 Pro through DeepInfra. Use a Hugging Face token to authenticate - the request will be routed to DeepInfra automatically.

From your favorite Agent Harness

Hugging Face Inference Providers are integrated in most Agent Harnesses - including Pi, OpenCode, Hermes Agents, OpenClaw, and more. This means you can plug DeepInfra-hosted models straight into your favorite tools without any extra glue code. Browse the full list of integrations here.

from Python

code

import os
from openai import OpenAI

client = OpenAI(
    base_url="https://router.huggingface.co/v1",
    api_key=os.environ["HF_TOKEN"],
)

completion = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-V4-Pro:deepinfra",
    messages=[
        {
            "role": "user",
            "content": "Write a Python function that returns the nth Fibonacci number using memoization."
        }
    ],
)

print(completion.choices[0].message)

from JS

code

import { OpenAI } from "openai";

const client = new OpenAI({
    baseURL: "https://router.huggingface.co/v1",
    apiKey: process.env.HF_TOKEN,
});

const chatCompletion = await client.chat.completions.create({
    model: "deepseek-ai/DeepSeek-V4-Pro:deepinfra",
    messages: [
        {
            role: "user",
            content: "Write a Python function that returns the nth Fibonacci number using memoization.",
        },
    ],
});

console.log(chatCompletion.choices[0].message);

Billing

For direct requests, i.e. when you use the key from an inference provider, you are billed by the corresponding provider. For instance, if you use a DeepInfra API key you're billed on your DeepInfra account.

For routed requests, i.e. when you authenticate via the Hugging Face Hub, you'll only pay the standard provider API rates. There's no additional markup from us; we just pass through the provider costs directly. (In the future, we may establish revenue-sharing agreements with our provider partners.)

Important Note ‼️ PRO users get $2 worth of Inference credits every month. You can use them across providers. 🔥

Subscribe to the Hugging Face PRO plan to get access to Inference credits, ZeroGPU, Spaces Dev Mode, 20x higher limits, and more.

We also provide free inference with a small quota for our signed-in free users, but please upgrade to PRO if you can!

Feedback and next steps

We would love to get your feedback! Share your thoughts and/or comments here: https://huggingface.co/spaces/huggingface/HuggingDiscussions/discussions/49

この記事をシェア

MarkTechPost重要度42026年7月5日 11:31

Qwen の元リーダーが「ハイブリッド思考」の誤りと、なぜ今「エージェント」を支持するのか

Simon Willison Blog2026年7月5日 10:00

sqlite-utils 4.0rc2、主にClaude Fable（約149.25ドル分）が執筆

TechCrunch AI2026年7月5日 00:51

ミストラル AI とは？OpenAI の競合企業に関する全知識

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む

import os from openai import OpenAI client = OpenAI( base_url="https://router.huggingface.co/v1", api_key=os.environ["HF_TOKEN"], ) completion = client.chat.completions.create( model="deepseek-ai/DeepSeek-V4-Pro:deepinfra", messages=[ { "role": "user", "content": "Write a Python function that returns the nth Fibonacci number using memoization." } ], ) print(completion.choices[0].message)

import { OpenAI } from "openai"; const client = new OpenAI({ baseURL: "https://router.huggingface.co/v1", apiKey: process.env.HF_TOKEN, }); const chatCompletion = await client.chat.completions.create({ model: "deepseek-ai/DeepSeek-V4-Pro:deepinfra", messages: [ { role: "user", content: "Write a Python function that returns the nth Fibonacci number using memoization.", }, ], }); console.log(chatCompletion.choices[0].message);

キーポイント

影響分析

編集コメント

仕組みについて

ウェブサイト UI において

クライアント SDK から

お気に入りのエージェントハーンチから

from Python

from JS

Billing

フィードバックと今後のステップ

How it works

In the website UI

From the client SDKs

From your favorite Agent Harness

from Python

from JS

Billing

Feedback and next steps

関連記事

キーポイント

影響分析

編集コメント

仕組みについて

ウェブサイト UI において

クライアント SDK から

お気に入りのエージェントハーンチから

from Python

from JS

Billing

フィードバックと今後のステップ

How it works

In the website UI

From the client SDKs

From your favorite Agent Harness

from Python

from JS

Billing

Feedback and next steps

関連記事