TLDR AI·2026年6月26日 09:00·約11分

1 コマンドで HF Jobs で vLLM サーバーを実行する方法（3 分読了）

#LLM #推論サーバー #vLLM #Hugging Face #インフラ自動化

TL;DR

TLDR AI は、Hugging Face の Jobs サービス上で単一コマンドで vLLM サーバーを起動する手順を紹介し、推論サーバー構築の効率化を実現した。

AI深層分析2026年6月27日 00:07

重要/ 5段階

深度40%

キーポイント

ワンコマンドでのデプロイ実現

Hugging Face Jobs サービス上で、複雑な設定なしに単一のコマンドで vLLM サーバーを起動できる手順が紹介されている。

推論サーバー構築の効率化

開発者が手動で環境構築や設定を行う手間を省き、効率的に高性能な推論サーバーを構築・運用できるようになる。

vLLM と HF Jobs の連携強化

高速推論ライブラリである vLLM と、クラウドインフラ管理プラットフォームの Hugging Face Jobs がシームレスに統合された事例を示している。

影響分析・編集コメントを表示

影響分析

この記事は、AI モデルの推論サーバー構築における参入障壁を下げ、開発者のワークフローを劇的に効率化する可能性を示しています。特に、vLLM のような高性能な推論エンジンとクラウドプラットフォームのシームレスな連携は、大規模モデルの実用化プロセスを加速させる重要な一歩となります。

編集コメント

単一コマンドでのサーバー起動は、開発者の生産性を向上させる実用的な改善であり、LLM 運用の民主化を後押しする動きです。

記事一覧に戻る

前提条件
サーバーの起動
どこからでもクエリを実行
クリーンアップ
さらに進んで：より大きなモデル
さらに進んで：UI でチャットする
さらに進んで：実行中のサーバーに SSH 接続する
さらに進んで：Pi と組み合わせてコーディングエージェントのバックエンドとして使用する
HF Jobs か Inference Endpoints か？
関連資料

単一のコマンドで、Hugging Face のインフラ上でプライベートな OpenAI 互換 LLM エンドポイントを起動できます。サーバーのプロビジョニングも Kubernetes も不要で、使用した秒数に応じた従量課金です。一度起動すれば、お使いのラップトップやノートブック、あるいはどこからでもクエリを実行できます。

これはテスト、評価、バッチ生成のためにモデルを素早く立ち上げる最速の方法です。（管理された本番環境対応サービスをお探しの場合は、Inference Endpoints が適しています。どちらを選ぶべきかの詳細については、末尾の「HF Jobs か Inference Endpoints か？」をご覧ください。）

以下に、全体の流れを最初から最後まで解説します。

前提条件

支払い方法、または正味のプリペイド残高（Jobs はハードウェアの使用量に応じて分単位で課金されます）。

huggingface_hub >= 1.20.0: pip install -U "huggingface_hub>=1.20.0" を実行してください。

ローカルでログイン済み：hf auth login を実行してください。

サーバーの起動

hf jobs run は、Hugging Face インフラ向けの Docker 実行コマンドです。公式の vllm/vllm-openai イメージを使用し、--flavor パラメータで GPU を要求し、--expose パラメータで vLLM のポートを公開します：

hf jobs run --flavor a10g-large --expose 8000 --timeout 2h \

vllm/vllm-openai:latest \

vllm serve Qwen/Qwen3-4B --host 0.0.0.0 --port 8000

--expose 8000 は、コンテナのポートを Hugging Face のパブリックジョブプロキシ経由でルーティングします（完全な参照については Serve Models guide をご覧ください）。このコマンドを実行すると、サーバーに到達できる URL が表示されます:

✓ Job started

id: 6a381ca1953ed90bfb947332

url: https://huggingface.co/jobs/qgallouedec/6a381ca1953ed90bfb947332

Hint: Exposed ports are reachable at (requires an HF token with read access to the job):

https://6a381ca1953ed90bfb947332--8000.hf.jobs

6a381ca1953ed90bfb947332 はあなたのジョブ ID です。これを記録しておいてください。後ほど、このポストの残りで <job_id> をプレースホルダーとして使用します。

重みのダウンロードと起動に数分かかるので、ログに「Application startup complete」と表示されるまでお待ちください。そうすれば稼働開始です。

どこからでもクエリを実行する

vLLM は OpenAI API に準拠しており、すべてのリクエストには Hugging Face トークンをベアラートークンとして使用すれば十分です。最も簡単な方法は curl を使うことです:

curl https://<job_id>--8000.hf.jobs/v1/chat/completions \

-H "Authorization: Bearer $(hf auth token)" \

-H "Content-Type: application/json" \

-d '{

"model": "Qwen/Qwen3-4B",

"messages": [{"role": "user", "content": "Hello!"}],

"chat_template_kwargs": {"enable_thinking": false}

これにより、通常の OpenAI スタイルの JSON が返されます。choices[0].message.content には「Hello! How can I assist you today? 😊」という内容が含まれています。

または、Python から OpenAI クライアントを公開された URL に指し示し、トークンを API キーとして渡します:

from huggingface_hub import get_token

from openai import OpenAI

client = OpenAI(

base_url="https://<job_id>--8000.hf.jobs/v1",

api_key=get_token(),

)

resp = client.chat.completions.create(

model="Qwen/Qwen3-4B",

messages=[{"role": "user", "content": "Hello!"}],

extra_body={"chat_template_kwargs": {"enable_thinking": False}},

)

print(resp.choices[0].message.content)

Hello! How can I assist you today? 😊

開始前に簡単なヘルスチェックを実行してください: curl https://<job_id>--8000.hf.jobs/v1/models -H "Authorization: Bearer $(hf auth token)" を実行すると、モデルの一覧が表示されるはずです。

🔐 エンドポイントはゲートされており、公開されていません。 すべてのリクエストには、ジョブのネームスペースに対する読み取り権限を持つ HF トークン**を含める必要があります。通常のブラウザからのアクセスは拒否されます。実質的に、ジョブプロキシが API ゲートウェイとして機能しており、アクセス権限はあなた（およびあなたの組織）に限定されています。これはプライベート利用には問題ありませんが、URL については適切に扱う必要があります：公開されていると期待して URL を共有したり、信頼できない場所にトークンを貼り付けたりしないでください。より細粒度なアクセスや公開アクセスが必要な場合は、代わりに適切なゲートウェイを前面に配置してください。または、以下の HF Jobs or Inference Endpoints? を参照してください。

クリーンアップ

ジョブは秒単位で課金されるため、作業が完了したらサーバーを停止してください:

hf jobs cancel <job_id>

設定した --timeout はセーフティネット（自動停止機能）ですが、明示的にキャンセルする方がコストを抑えられます。a10g-large の料金は 1 時間あたり 1.50 ドルです。hf jobs ハードウェアページで完全な価格リストを確認し、モデルに適合する最も小さなフレーバーを選択してください。

さらに先へ：より大きなモデル

同じコマンドははるかに大きなモデルにもスケールします。より強力な --flavor を選択し、vLLM に --tensor-parallel-size パラメータを指定してモデルを GPU 間でシャードするように指示してください。例えば、2× H200 環境で動作する 122B Qwen3.5 mixture-of-experts モデルの例は以下の通りです：

hf jobs run --flavor h200x2 --expose 8000 --timeout 2h \

vllm/vllm-openai:latest \

vllm serve Qwen/Qwen3.5-122B-A10B \

--host 0.0.0.0 --port 8000 --tensor-parallel-size 2 \

--max-model-len 32768 --max-num-seqs 256

--tensor-parallel-size はフレーバーに含まれる GPU の数と一致させる必要があります（h200x2 → 2、h200x8 → 8）。利用可能なリソースを確認するには hf jobs hardware を実行し、より大きなモデルにはダウンロードや読み込みに時間がかかるため、より長い --timeout を設定してください。大規模なモデルの場合、H200 フレーバーが通常は最もコストパフォーマンスに優れています。

--max-model-len 32768 --max-num-seqs 256 というフラグは、このモデル固有の設定です。Qwen3.5-122B はハイブリッド Mamba/アテンションアーキテクチャであり、デフォルトのコンテキスト長が 256K トークンですが、これでは vLLM のデフォルトバッチ設定に対して十分なメモリが残っていません。コンテキスト長と並列シーケンス数を制限することで、GPU メモリ内に収まるようにしています。もしモデルがメモリ不足エラーやキャッシュブロックエラーで起動しない場合は、まずこれらの値を下げて試すのが最初の対処法です。それ以外の要素（公開された URL、OpenAI クライアント、トークン認証）はすべて全く同じままです。

さらに進んで：UI でチャットする

curl ではなくチャットウィンドウを好みますか？数行の Gradio コードで同じエンドポイントに接続できます。Qwen3 の思考プロセスが別フィールドとして返されるように（必須ではありませんが有用です）、vllm serve コマンドに --reasoning-parser deepseek_r1 を追加してください。その後、このコードをローカルで実行します（ジョブ ID が必要になるだけです）:

import gradio as gr

from gradio import ChatMessage

from huggingface_hub import get_token

from openai import OpenAI

client = OpenAI(base_url="https://<job_id>--8000.hf.jobs/v1", api_key=get_token())

def chat(message, history):

messages = [{"role": m["role"], "content": m["content"]} for m in history if not m.get("metadata")]

messages.append({"role": "user", "content": message})

stream = client.chat.completions.create(model="Qwen/Qwen3-4B", messages=messages, stream=True)

thinking, answer = "", ""

for chunk in stream:

delta = chunk.choices[0].delta

thinking += delta.model_extra.get("reasoning", "")

answer += delta.content or ""

out = []

if thinking.strip():

status = "done" if answer.strip() else "pending"

out.append(ChatMessage(role="assistant", content=thinking, metadata={"title": "💭 Thinking", "status": status}))

if answer.strip():

out.append(ChatMessage(role="assistant", content=answer))

yield out

gr.ChatInterface(chat).launch()

実行して、http://127.0.0.1:7860 を開き、チャットを開始してください。推論（reasoning）が折りたたみパネルにストリーミングされ、回答はその下に表示されます。

さらに進む：実行中のサーバーへの SSH 接続

起動時の失敗をデバッグしたり、GPU メモリを監視したり、ログを対話的に追跡したい場合は、実行中のジョブに直接シェルを開くことができます。--ssh オプションで起動し、公開鍵が huggingface.co/settings/keys に登録されていることを確認してください:

hf jobs run --flavor a10g-large --expose 8000 --timeout 2h --ssh \

vllm/vllm-openai:latest \

vllm serve Qwen/Qwen3-4B --host 0.0.0.0 --port 8000

その後、ジョブ ID を使用して接続します:

hf jobs ssh <job_id>

これでコンテナ内にいる状態になります。ここでは nvidia-smi を実行したり、プロセスを検査したり、モデルに直接アクセスしたりできます。外部からログを読むよりも、デバッグや監視がはるかに容易になります。SSH サポートには huggingface_hub >= 1.20.0 が必要です。

さらに先へ：Pi を用いたコーディングエージェントのバックエンドとして利用する

同じエンドポイントは、ターミナル上のコーディングエージェントのバックエンドとしても機能します。Pi はプロバイダーに依存しないエージェントハーンです。これをジョブに指向させることで、ご自身でホストしたモデル上で動作する Read/Write/Edit/Bash エージェントが利用可能になります。

まず設定すべき点があります：エージェントはツール呼び出しを通じてモデルを駆動しますが、vLLM はサーバー起動時にツール呼び出し機能が有効になっていない限りそれを受け付けません。したがって、--enable-auto-tool-choice を指定し、モデルファミリーに一致する --tool-call-parser（Qwen の場合は hermes）を付けて再実行してください。また、エージェントはより強力なモデルから恩恵を受けるため、ここでより大きなモデルを導入するのが好ましいでしょう：

code

hf jobs run --flavor h200x2 --expose 8000 --timeout 2h \
  vllm/vllm-openai:latest \
  vllm serve Qwen/Qwen3.5-122B-A10B \
  --host 0.0.0.0 --port 8000 --tensor-parallel-size 2 \
  --max-model-len 32768 --max-num-seqs 256 \
  --reasoning-parser deepseek_r1 \
  --enable-auto-tool-choice --tool-call-parser hermes

次に、~/.pi/agent/models.json にジョブをカスタムプロバイダーとして追加します：

json

{
  "providers": {
    "hf-jobs": {
      "baseUrl": "https://<job_id>--8000.hf.jobs/v1",
      "api": "openai-completions",
      "apiKey": "!hf auth token",
      "models": [
        { "id": "Qwen/Qwen3.5-122B-A10B" }
      ]
    }
  }
}

その後、エージェントを起動します：

code

pi

数コマンド前に起動したモデルが、今やターミナル上で対話型のコーディングエージェントを駆動しています。

HF Jobs と推論エンドポイントの比較

HF Jobs は、Hugging Face でモデルをサービス提供するための唯一の方法ではありません。Inference Endpoints は、同じ目的のための当社のマネージド製品であり、どちらが適しているかは、あなたが何を求めているかによります。

HF Jobs を利用するのは、最大限の柔軟性と制御を望む場合です。これは HF インフラ上での単なる docker run であり、イメージ、vllm serve の正確なフラグ、ハードウェアを自分で選択でき、ジョブが実行されている間のみ秒単位で課金されます。そのため、実験、ワンオフの評価、バッチ生成、あるいは何かを決める前にモデルを試す場合に非常に適しています。

Inference Endpoints を利用するのは、より本番環境向け（プロダクションレディ）なものを望む場合です。これには、長期間稼働するサービスに必要な運用上の利便性が追加されています：より細粒度のアクセス制御（エンドポイントは公開、保護、または非公開に設定可能）、スケール・トゥ・ゼロ機能により、アイドル期間中は課金されません。永続的なエンドポイントを構築する場合やジョブを実行する場合ではなく、こちらが適したツールです。

さらに読む

この投稿は vLLM に焦点を当てていますが、ポートを公開するパターンは、OpenAI 互換サーバーであればすべてに適用可能です。llama.cpp で GGUF をサービス提供したり、代わりに SGLang を実行したい場合は、Serve Models on Jobs guide をご覧ください。そこではこれらのバックエンドの使用方法が解説されています。

原文を表示

Back to Articles

Prerequisites
Launch the server
Query it from anywhere
Clean up
Going further: bigger models
Going further: Chat with it in a UI
Going further: SSH into the running server
Going further: Use it as a coding-agent backend with Pi
HF Jobs or Inference Endpoints?
Further reading

You can spin up a private, OpenAI-compatible LLM endpoint on Hugging Face infrastructure with a single command — no servers to provision, no Kubernetes, pay-per-second. Once it's up, you can query it from your laptop, a notebook, or anywhere else.

It's the quickest way to stand up a model for tests, evals, or batch generation. (If you're after a managed, production-ready service instead, that's what Inference Endpoints are for — more on when to pick which at the end.)

Here's the whole thing end to end.

Prerequisites

A payment method or a positive prepaid credit balance (Jobs is billed per‑minute by hardware usage).

huggingface_hub >= 1.20.0: pip install -U "huggingface_hub>=1.20.0".

Logged in locally: hf auth login.

Launch the server

hf jobs run is docker run for HF infrastructure. We use the official vllm/vllm-openai image, ask for a GPU with --flavor, and expose vLLM's port with --expose:

code

hf jobs run --flavor a10g-large --expose 8000 --timeout 2h \
  vllm/vllm-openai:latest \
  vllm serve Qwen/Qwen3-4B --host 0.0.0.0 --port 8000

--expose 8000 routes the container's port through HF's public jobs proxy (see the Serve Models guide for the full reference). The command prints the URL your server is reachable at:

code

✓ Job started
  id: 6a381ca1953ed90bfb947332
  url: https://huggingface.co/jobs/qgallouedec/6a381ca1953ed90bfb947332
Hint: Exposed ports are reachable at (requires an HF token with read access to the job):
  https://6a381ca1953ed90bfb947332--8000.hf.jobs

6a381ca1953ed90bfb947332 is your job ID. Keep track of it, we'll need it. We'll use <job_id> as a placeholder for it in the rest of the post.

Give it a couple of minutes to download weights and boot. When the logs show Application startup complete, you're live.

Query it from anywhere

vLLM speaks the OpenAI API, and every request just needs your HF token as a bearer token. The quickest way to hit it is curl:

code

curl https://<job_id>--8000.hf.jobs/v1/chat/completions \
  -H "Authorization: Bearer $(hf auth token)" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen/Qwen3-4B",
    "messages": [{"role": "user", "content": "Hello!"}],
    "chat_template_kwargs": {"enable_thinking": false}
  }'

which returns the usual OpenAI-style JSON, with choices[0].message.content holding "Hello! How can I assist you today? 😊".

Or, from Python, point the OpenAI client at the exposed URL and pass the token as the API key:

code

from huggingface_hub import get_token
from openai import OpenAI

client = OpenAI(
    base_url="https://<job_id>--8000.hf.jobs/v1",
    api_key=get_token(),
)
resp = client.chat.completions.create(
    model="Qwen/Qwen3-4B",
    messages=[{"role": "user", "content": "Hello!"}],
    extra_body={"chat_template_kwargs": {"enable_thinking": False}},
)
print(resp.choices[0].message.content)

code

Hello! How can I assist you today? 😊

Quick health check before you start: curl https://<job_id>--8000.hf.jobs/v1/models -H "Authorization: Bearer $(hf auth token)" should list the model.

🔐 The endpoint is gated, not public. Every request must carry an HF token with read access to the job's namespace. A plain browser visit will be rejected. In effect, the jobs proxy is your API gate: access is scoped to you (and your org). That's fine for private use, but treat the URL accordingly: don't share it expecting it to be open, and don't paste your token into untrusted places. If you need finer-grained or public access, put a proper gateway in front instead. Or see HF Jobs or Inference Endpoints? below.

Clean up

Jobs are billed per second, so stop the server when you're done:

code

hf jobs cancel <job_id>

The --timeout you set is a safety net (it'll auto-stop), but cancelling explicitly is cheaper. An a10g-large runs at $1.50/hour — check hf jobs hardware for the full price list and pick the smallest flavor that fits your model.

Going further: bigger models

The same command scales to much larger models — pick a beefier --flavor and tell vLLM to shard the model across the GPUs with --tensor-parallel-size. For example, the 122B Qwen3.5 mixture-of-experts model on 2× H200:

code

hf jobs run --flavor h200x2 --expose 8000 --timeout 2h \
  vllm/vllm-openai:latest \
  vllm serve Qwen/Qwen3.5-122B-A10B \
  --host 0.0.0.0 --port 8000 --tensor-parallel-size 2 \
  --max-model-len 32768 --max-num-seqs 256

--tensor-parallel-size should match the number of GPUs in the flavor (h200x2 → 2, h200x8 → 8). Run hf jobs hardware to see what's available and give bigger models a longer --timeout, since they take longer to download and load. For large models, H200 flavors are usually the best value.

The --max-model-len 32768 --max-num-seqs 256 flags are specific to this model: Qwen3.5-122B is a hybrid Mamba/attention architecture with a 256K-token default context, which doesn't leave enough memory for vLLM's default batch settings. Capping the context length and concurrent-sequence count keeps it within the GPUs' memory. If a model fails to start with an out-of-memory or cache-block error, dialing these two down is the first thing to try. Everything else (the exposed URL, the OpenAI client, the token auth) stays exactly the same.

Going further: Chat with it in a UI

Prefer a chat window over curl? A few lines of Gradio point at the same endpoint. Add --reasoning-parser deepseek_r1 to the vllm serve command so Qwen3's thinking comes back as a separate field (not necessary, but helpful), then run this code locally (you'll just need the job ID):

code

import gradio as gr
from gradio import ChatMessage
from huggingface_hub import get_token
from openai import OpenAI

client = OpenAI(base_url="https://<job_id>--8000.hf.jobs/v1", api_key=get_token())

def chat(message, history):
    messages = [{"role": m["role"], "content": m["content"]} for m in history if not m.get("metadata")]
    messages.append({"role": "user", "content": message})
    stream = client.chat.completions.create(model="Qwen/Qwen3-4B", messages=messages, stream=True)

    thinking, answer = "", ""
    for chunk in stream:
        delta = chunk.choices[0].delta
        thinking += delta.model_extra.get("reasoning", "")
        answer += delta.content or ""
        out = []
        if thinking.strip():
            status = "done" if answer.strip() else "pending"
            out.append(ChatMessage(role="assistant", content=thinking, metadata={"title": "💭 Thinking", "status": status}))
        if answer.strip():
            out.append(ChatMessage(role="assistant", content=answer))
        yield out

gr.ChatInterface(chat).launch()

Run it, open http://127.0.0.1:7860, and chat — reasoning streams into the collapsible panel, the answer below.

Going further: SSH into the running server

Need to debug a startup failure, watch GPU memory, or tail logs interactively? You can open a shell straight into the running job. Launch it with --ssh and make sure your public key is registered at huggingface.co/settings/keys:

code

hf jobs run --flavor a10g-large --expose 8000 --timeout 2h --ssh \
  vllm/vllm-openai:latest \
  vllm serve Qwen/Qwen3-4B --host 0.0.0.0 --port 8000

then connect with the job ID:

code

hf jobs ssh <job_id>

You're now inside the container, where you can run nvidia-smi, inspect the process, or poke at the model directly — which makes debugging and monitoring much easier than reading logs from the outside. SSH support requires huggingface_hub >= 1.20.0.

Going further: Use it as a coding-agent backend with Pi

The same endpoint can back a terminal coding agent. Pi is a provider-agnostic agent harness. Point it at the job and you get a Read/Write/Edit/Bash agent running on your own self-hosted model.

One thing to set up first: agents drive the model through tool calls, and vLLM only accepts those if the server is launched with tool calling enabled. So relaunch with --enable-auto-tool-choice and a --tool-call-parser matching the model family (hermes for Qwen3). Agents also benefit from a stronger model, so this is a good place to bring in the bigger one:

code

hf jobs run --flavor h200x2 --expose 8000 --timeout 2h \
  vllm/vllm-openai:latest \
  vllm serve Qwen/Qwen3.5-122B-A10B \
  --host 0.0.0.0 --port 8000 --tensor-parallel-size 2 \
  --max-model-len 32768 --max-num-seqs 256 \
  --reasoning-parser deepseek_r1 \
  --enable-auto-tool-choice --tool-call-parser hermes

Then add the job as a custom provider in ~/.pi/agent/models.json:

code

{
  "providers": {
    "hf-jobs": {
      "baseUrl": "https://<job_id>--8000.hf.jobs/v1",
      "api": "openai-completions",
      "apiKey": "!hf auth token",
      "models": [
        { "id": "Qwen/Qwen3.5-122B-A10B" }
      ]
    }
  }
}

Then launch the agent against it:

code

pi

The model you spun up a couple of commands ago, now driving an interactive coding agent in your terminal.

HF Jobs or Inference Endpoints?

HF Jobs isn't the only way to serve a model on Hugging Face. Inference Endpoints are our managed product for the same job, and which one fits depends on what you're after.

Reach for HF Jobs when you want maximum flexibility and control: it's just docker run on HF infrastructure, so you pick the image, the exact vllm serve flags, and the hardware, and you pay per second for as long as the job runs. That makes it a great fit for experiments, one-off evals, batch generation, or kicking the tires on a model before committing to anything.

Reach for Inference Endpoints when you want something more production-ready. They add the operational niceties a long-lived service needs: finer-grained access control (an endpoint can be public, protected, or private), and scale-to-zero, so you're not billed during periods of inactivity. If you're standing up a durable endpoint rather than running a job, that's the tool to grab.

キーポイント

影響分析

編集コメント

前提条件

サーバーの起動

どこからでもクエリを実行する

クリーンアップ

さらに先へ：より大きなモデル

さらに進んで：UI でチャットする

さらに進む：実行中のサーバーへの SSH 接続

さらに先へ：Pi を用いたコーディングエージェントのバックエンドとして利用する

HF Jobs と推論エンドポイントの比較

さらに読む

Prerequisites

Launch the server

Query it from anywhere

Clean up

Going further: bigger models

Going further: Chat with it in a UI

Going further: SSH into the running server

Going further: Use it as a coding-agent backend with Pi

HF Jobs or Inference Endpoints?

Further reading

関連記事

キーポイント

影響分析

編集コメント

前提条件

サーバーの起動

どこからでもクエリを実行する

クリーンアップ

さらに先へ：より大きなモデル

さらに進んで：UI でチャットする

さらに進む：実行中のサーバーへの SSH 接続

さらに先へ：Pi を用いたコーディングエージェントのバックエンドとして利用する

HF Jobs と推論エンドポイントの比較

さらに読む

Prerequisites

Launch the server

Query it from anywhere

Clean up

Going further: bigger models

Going further: Chat with it in a UI

Going further: SSH into the running server

Going further: Use it as a coding-agent backend with Pi

HF Jobs or Inference Endpoints?

Further reading

関連記事