Hugging Face Blog·2026年6月23日 09:00·約16分で読める

AI、オープンツール、人間をループに組み込みながら huggingface_hub を毎週リリース

#huggingface_hub #Open Source #Human-in-the-loop #DevOps #Software Maintenance

TL;DR

Hugging Face は、AI ツールの活用と人間のフィードバックをループに組み込むことで、huggingface_hub の毎週の更新プロセスを実現した。

AI深層分析2026年6月23日 17:04

注目/ 5段階

深度40%

キーポイント

人間と AI の協働による自動化

huggingface_hub の毎週リリースを可能にするため、AI ツールを活用しつつも最終判断には人間のフィードバックを組み込むハイブリッドなワークフローを採用した。

オープンツールの積極的活用

開発プロセスにおいてオープンソースのツールやリソースを最大限に活用し、効率的かつ透明性の高い更新サイクルを構築している。

継続的な品質保証と迅速な展開

人間の監視（Human-in-the-loop）を導入することで、自動化された高速な更新においても品質と信頼性を維持する仕組みを確立した。

影響分析・編集コメントを表示

影響分析

この記事は、AI ツールを単なる自動化の手段としてではなく、人間の判断と組み合わせることで開発品質を維持しつつスピードを高める「人間中心の AI 活用」の成功事例を示しています。特に、毎週の更新という高頻度なリリースを実現した点は、オープンソースプロジェクトの持続可能性と敏捷性において他社への示唆となる重要な知見です。

編集コメント

「AI に任せる」だけでなく「人間がループに組み込む」という姿勢は、信頼性が求められるライブラリ開発において極めて重要なバランス感覚を示しています。このハイブリッドモデルは、他の大規模オープンソースプロジェクトの運用戦略としても参考になるでしょう。

記事一覧に戻る

huggingface_hub は、Hugging Face エコシステムの基盤となる Python クライアントです。transformers、datasets、diffusers、sentence-transformers など dozens の他のライブラリも、Hub と通信するためにこれに依存しています。毎週新しいリリースを公開しないことは、main ブランチ上に修正や新機能が溜まり続けることを意味します。

長い間、私たちは 4 から 6 ヶ月に一度の頻度でリリースを行っていました。現在は、単一の GitHub Actions ワークフローから毎週リリースを行っています。これはオープンソースツールとオープンウェイトモデルを用いて構築され、判断が最も重要な箇所においてのみ人間をループに組み込みました。この投稿に記載されている内容には、ベンダーとの契約やクローズドなモデル、あるいは自分で実行できないインフラストラクチャは不要です。これは当初から設計目標であり、他のメンテナーも引き継いで適応できるワークフローを目指したためです。

この記事を読み終える頃には、あなた自身で同様のものを構築するために必要なすべてが揃っているはずです。

私たちが始めた場所

以前のプロセスは部分的に自動化されていましたが、主に手作業でした。

すでに CI に組み込まれていたもの:

タグがプッシュされたら PyPI への公開。
リリース候補を固定した下流ライブラリでのテストブランチの作成。

毎回必ず手動で行っていたもの:

リリースブランチの作成、__init__.py 内のバージョン番号の更新、コミット、タグ付け、プッシュ。
下流 CI の実行状況の確認と失敗箇所のトリアージ。

前回のリリース以降にマージされたすべての PR を読み込み、手動でリリースノートを作成すること：テーマ別にグループ化し、文脈を含め、git ログのダンプのように読めないような文体で記述する。

RC 期間終了後に安定版リリースをカットする。

社内 Slack での発表文やソーシャルメディア投稿のドラフトを作成する。

リリース後の PR を開いて、main ブランチを次の dev0 バージョンに bump する。

新しいバージョン向けの良質なノート（注釈）を書くのが最も重労働でした。異なるトピックに関する数十件の PR を集約する必要があったからです。技術的に難しい点はありませんが、数時間にわたる集中した注意力が必要です。発表文を追加すれば、小規模なリリースでも数日に分けて作業する半日分の仕事量になります。

2 つの種類の作業

そこで私たちは全体を合理化することにしました。そのリストを見ると、作業は 2 つに分類できます。

一部のステップは純粋に機械的なもので、自動化可能です：バージョンの bump、コミット、タグ付け、プッシュ、下流テストブランチの作成、リリース後の PR の開封などです。これらについて人間が考える必要はありません。重要なのは、毎回正しい順序で実行されることだけであり、それは CI ワークフロー（Continuous Integration workflow）が得意とするところです。

残りの部分は異なります。リリースノートを書くこと、何を強調するかを決定すること、人間を対象とした発表文の文言を決めること：これらは頭を使う作業です。長年にわたりリリースマニュアルが存在し続けた理由となるような判断力が求められます。ここで AI が登場します。空白のページから数秒で確かな最初のドラフトを作成するのです。ただし、ここには注意が必要です。自信に満ちたように見えつつも微妙に間違っているドラフトは、ドラフトがないことよりも悪くなる可能性があるからです。

デザイン原則：オープンな部分、誰でも再利用可能

これを修正することにした際、最初に一つの制約を設けました：すべての移動部品は、どのメンテナーも自分自身で実行できるものでなければならないということです。私たちが交換できない API の背後にあるクローズドなモデルもあれば、独自のリリースプラットフォームや秘密のレシピもありません。

ここがスタック全体です：

Part

What it does

GitHub Actions

Orchestrates the whole release

OpenCode

Agent runtime that drives the model

An open-weights model (currently GLM-5.2 from Z.ai)

Drafts the release notes and Slack announcement

HF Inference Providers

Serves the model

PyPI Trusted Publishing

Publishes the package

2 つ目の原則は、モデルが草案を作成し、人間が決断することです。言語モデルは、30 件の簡潔な PR タイトルを読みやすいリリースノートに変換するのは得意ですが、盲目的に信頼させるのは苦手です。そのため、ワークフローは人間の監督下で行われます：モデルが最初のパスを行い、決定論的なスクリプトがその作業をチェックし、何も出荷される前に人間がレビューして編集します（詳細は後述）。

パイプラインのツアー

完全なワークフローは単一のファイル .github/workflows/release.yml にあり、Actions UI から手動でトリガーされます。必要な入力はただ一つです：

on:

workflow_dispatch:

inputs:

release_type:

type: choice

options:

minor-prerelease # メインブランチから RC を作成する
minor-release # RC を最終版に昇格させる
patch-release # 既存のリリースブランチ上のバグ修正を行う

そこから、ジョブは概ね以下の順序で実行されます:

準備。次のバージョンを計算し、リリースブランチを作成または再利用し、__version__ を更新し、コミットし、タグ付けし、プッシュします。

PyPI への公開。huggingface_hub をビルドしてアップロードします。並行して、hf CLI を独立した PyPI パッケージとしてビルドしてアップロードします。

リリースノート。直近のタグ以降のコミット範囲を差分取り、GitHub API から PR メタデータを取得し、モデルに構造化された変更履歴（最近のものはこちら）を作成させます。ドラフト GitHub リリースとして保存されます。

下流テストブランチ。RC の場合、transformers、datasets、diffusers、sentence-transformers に RC を固定したブランチを開き、それぞれの CI で何か壊れていないか迅速に確認できるようにします。

Slack での告知。ノートを読み取り、チームのトーンで内部告知を作成します。

ノートのアーカイブ。生の AI ドラフト版と人間が編集した版を並べて Hugging Face Bucket にアップロードします。

リリース後のバージョン更新。安定リリース後、メインブランチに次の dev0 へのバージョン更新 PR を作成します。

公開された PR へのコメント。リリースに含まれるすべての PR に「vX.Y.Z で公開されました」というコメントを残します。

CLI ドキュメントの同期。再生成された hf CLI スキルドキュメントを skills リポジトリにプッシュする PR を作成します。

Slack に報告。各ステップのステータスはスレッド返信として投稿され、最終ジョブは✅または❌でルートメッセージを更新します。

残りの手動手順は、ドラフトリリースノートのレビューと公開、および内部 Slack メッセージのレビューと投稿です。これら 2 つの手順こそが、人間をループに組み込むべき箇所となります。

信頼せよ、しかし検証せよ：人間在ループの中核

AI 生成によるリリースノートで誰もが懸念する失敗モードとは、モデルが PR を静かに削除したり、今回のリリースに含まれていない PR を捏造したりすることです。ほぼ正しい変更履歴は、誰も再確認しないため、変更履歴がないよりも悪くなります。

生成されたリリースノートが初回で完全であると信頼せず、決定論的に検証します。モデル実行前に Python スクリプトがそのリリースに属するすべての PR を取得し、これを正解（グラウンドトゥルース）として保存します。

決定論的：範囲内の squash-merge コミットから PR 番号を抽出。

PR_NUMBER_PATTERN = re.compile(r"$#(\d+)$$")

pr_numbers = [

int(m.group(1))

for commit in commits_since_last_tag

if (m := PR_NUMBER_PATTERN.search(commit.title))

]

save_manifest(pr_numbers) # 真のソース

その後、モデルがこれらからノートドラフトを作成します。完了後、その出力を初期の PR リストと比較して検証します：

expected = set(load_manifest()) # あるべきもの

found = extract_pr_refs(notes_md) # モデルが記述したもの (#1234 -> 1234)

missing = expected - found # シームレスにドロップされる

extra = found - expected # 異なるリリースに属する

何かしらが不足しているか、余分にある場合でも、失敗せず、誤ったファイルを出荷することはない。その不一致をエージェントに戻し、正確に対応する PR を修正するよう求める:

for _ in range(MAX_ITERATIONS):

missing, extra = validate(notes)

if not missing and not extra:

break # マニフェストと完全に一致

run_agent_fix(missing_prs=missing, extra_prs=extra)

これは、全体を信頼できるものにするパターンである：非決定性のモデルを、決定性のガードレールで包み込む。モデルは文章作成が得意だが、網羅性については信頼できない。そこで、記述はモデルに任せ、一貫性はコードによって強制する。

モデルが作り話をしないように接地する

完全性は半分の要素である。正確性がもう半分である。PR のタイトルだけから要約を行うと、モデルは実在しない API に一致しないコード例を平然と創作してしまう。

それを防ぐため、PR メタデータを取得する際、各 PR からの実際のドキュメント差分も同時に取得する：PR が触れた docs/ 配下の .md ファイルの統一差分（unified diff）である。

def fetch_doc_diffs(pr):

return [

{"filename": f.filename, "status": f.status, "patch": f.patch}

for f in pr.get_files()

if f.filename.startswith("docs/") and f.filename.endswith(".md") and f.patch

]

この差分はモデルのコンテキストに組み込まれるため、モデルが「新しい CLI コマンドはこちら」と記述する際、PR 作成者が実際にドキュメントに記載した例を引用していることになります。これは以前のロジックと同じです：モデルには実際のソース資料を与え、作業範囲を狭く限定します。

プロンプト自体は Skills として保存されています。これはリポジトリにチェックインされた小さな Markdown ファイル（SKILL.md と参照テンプレート）です。リリースノート用のスキルでは、ハイライトの選び方、セクションの構成方法、ドキュメントリンクを追加するタイミングなどを詳細に記述しています。これはオンボーディングマニュアルのように読め、まさに適切なメンタルモデルと言えます。

人間のチェックポイント

RC が公開された後、ドラフト版の GitHub リリースには AI による最初の通し稿が格納されます。ここで人間が介入します：

レビューアがドラフトを読み、トーンや強調点を調整し、モデルが過大評価または過小評価した箇所を修正します。

その後初めて、マイナーリリース実行トリガーが発動され、RC が最終版へ昇格します。

レビューアの時間は、執筆に半日かかる作業を 15 分程度の編集セッションにまとめるための研磨に充てられます。

また、時間経過とともに改善を図るため、記録も残しています。Hugging Face Bucket には 2 つのファイルを並列してアーカイブします：誰も手をつける前の RC 時点でアップロードされる生の AI ドラフトと、最終リリースがカットされた際にアップロードされる人間による編集版です。

RC 時点：モデルから直接、未修正のまま

hf cp release_notes_raw.txt "hf://buckets/huggingface/releases/huggingface_hub/${V}/release_notes_raw.txt"

リリース時点：人間のレビュー後

hf cp release_notes_edited.txt "hf://buckets/huggingface/releases/huggingface_hub/${V}/release_notes_edited.txt"

両方を毎週収集することで、「モデルが書いたもの」と「私たちが望んだもの」の比較データセットが蓄積され、このデータセットを再利用してエージェントのスキルを更新できます。

オープンで安全な基盤整備

リリースプロセスの見直しは、特にサプライチェーン攻撃に対するセキュリティ強化の好機となりました。

PyPI トークンの不使用。 公開には Trusted Publishing を使用します。PyPI は GitHub がこの特定のワークフローのために発行した短寿命の OIDC トークンを検証し、すべてのアーティファクトに対して PEP 740 証明 / Sigstore 由来情報を発行します。漏洩やローテーションを必要とする長期秘密鍵は存在しません。

permissions:

id-token: write # PyPI 用の OIDC トークンをミントする

attestations: write # Sigstore 由来情報を生成する

...

uses: pypa/gh-action-pypi-publish@v1.14.0

with:

attestations: true # パスワードなし、API トークンなし、OIDC のみ

エージェントランタイムは固定され検証済みです。 最新の OpenCode を curl | bash で取得して運を天に任せるようなことはしません。バージョンを固定し、実行前にその SHA256 ハッシュ値を確認します：

curl -fsSL https://opencode.ai/install | bash -s -- --version "${OPENCODE_VERSION}"

echo "${OPENCODE_SHA256} $(which opencode)" | sha256sum -c -

Open tooling doesn't mean careless tooling.

So, what did it cost?

Almost nothing. A full release (notes plus the Slack announcement, across 20-40 PRs and a few rounds of prompting) costs about $0.25 on Inference Providers. With open weights billed pay-as-you-go, the only real question each week is "is there something worth shipping?", and there always is.

What changed in practice

The cadence went from one release every 4 to 6 weeks to once a week. The secondary effects were the interesting ones:

Notes got better, not worse. A first draft always exists, so review time goes to polishing. Grouping is more consistent and we omit fewer things.

Breakages surface earlier. Downstream test branches on every RC catch integration issues during the candidate window.

Contributor loops shortened. The automatic "shipped in vX.Y.Z" comment turned out to matter more than we expected. When someone reports an issue on a closed PR, everyone can immediately see which release the fix is in. That used to be a manual tag hunt.

Make it yours

This is the part we cared about most. The workflow is shaped around huggingface_hub but the structure is generic.

Reusable almost as-is:

The trigger and version-bump logic (minor-prerelease then minor-release then patch-release).

信頼して検証するループ：決定論的なマニフェスト、モデル草案の作成、検証、再プロンプト。これは生成物に関係なく適用可能なアイデアです。

OIDC による信頼できる公開（OIDC Trusted Publishing）、固定化されチェックサム検証済みのランタイム、Slack のスレッド機能。

スキルベースのプロンプト：テンプレートを入れ替え、構造は維持する。

私たちにとっての固有要素:

下流リポジトリ一覧と、それらの依存関係固定フォーマット。

スキルにおける正確なセクション分類体系とトーン。

Slack およびバケットの宛先。

これに適応するには：ワークフローファイルをフォークし、スクリプトも同様に行い、対象パッケージを指し示します。さらに、プロジェクトのトーンに合わせたスキル Markdown を書き換え、2 つのリポジトリ変数（モデル ID と OpenCode のバージョン）を設定し、PyPI 上で信頼できる公開（Trusted Publishing）をセットアップします。下流リポジトリを持たない場合は、下流テストジョブを削除してください。そのまま再利用する価値があるのは「信頼して検証するループ」の部分です。これが生成されたアーティファクトを安全にリリース可能にする要素となります。

次のステップ

下流の失敗を自動仕分けすること。現在はワークフローがテストブランチを開き、人間が CI を確認しています。明らかな次のステップは、 failing logs（失敗ログ）を確認し、内部 Slack メッセージで報告することです。

パターンの拡張。これの大部分は汎用的であり、当社のエコシステム内の他の Python ライブラリ間でも大規模な部分を再利用できることを期待しています。

結論

リリースにおいて従来、半日分の集中的な人的作業（ノート作成、発表ドラフト作成、下流チェックの調整）を必要としていた部分は、モデルがドラフト作成に得意とする部分です。それ以外のすべては機械的な処理であり、YAML ファイルに収まります。肝要なのは単に「AI に任せる」ことではなく、モデルにドラフトを作成させ、決定論的なコードで検証し、人間が最終判断を下すというプロセスにあります。これはオープンツールとオープンウェイトのみで構築されており、コストはゼロに近く、誰でも実行可能です。

完全なワークフローファイルは公開されています。Python ライブラリを維持している場合は、これをフォークして適応し、その結果をお知らせください！

原文を表示

Back to Articles

huggingface_hub is the Python client at the base of the Hugging Face ecosystem. transformers, datasets, diffusers, sentence-transformers and dozens of other libraries depend on it to talk to the Hub. Every week we don't ship a new release is a week of fixes and features stuck on main.

For a long time we released every 4 to 6 weeks. We now release every week from a single GitHub Actions workflow. We built it using open-source tools and open-weights models and kept a human in the loop at the one place where judgment matters. Nothing in this post requires a vendor contract, a closed model, or infrastructure you can't run yourself. That was a design goal from the start since we wanted a workflow other maintainers could pick up and adapt.

By the end of this post, you'll have everything you need to build your own.

Where we started

The old process was partly automated, mostly manual.

Already in CI:

Publishing to PyPI once a tag was pushed.

Opening test branches in downstream libraries with the release candidate pinned.

Still manual, every single time:

Creating the release branch, bumping the version in __init__.py, committing, tagging, pushing.

Watching the downstream CI runs and triaging failures.

Reading through every PR merged since the last release and writing release notes by hand: grouped by theme, with context, in a voice that didn't read like a git log dump.

Cutting the stable release after the RC period.

Drafting an internal Slack announcement and social posts.

Opening the post-release PR to bump main to the next dev0.

Writing good notes for a new version was the heavy part, aggregating tens of PRs on different topics. Nothing technically hard but a few hours of focused attention. Add the announcements on top and a minor release was easily a half-day of work spread over several days.

Two kinds of work

So we decided to streamline the whole thing. Looking at that list, the work splits in two.

Some steps are purely mechanical and can be automated: bumping the version, committing, tagging, pushing, opening downstream test branches, opening the post-release PR. Nobody needs to think about those. They just have to happen in the right order, every time, which is what a CI workflow is good at.

The rest is different. Writing release notes, deciding what to highlight, phrasing an announcement for a human audience: that's brain work. It's the kind of judgment that kept the release manual for years. This is where AI comes in, turning a blank page into a solid first draft in seconds. It's also where we have to be careful because a draft that looks confident and is subtly wrong is worse than no draft at all.

The design principle: open parts, reusable by anyone

When we decided to fix this, we set one constraint up front: every moving part had to be something any maintainer could run themselves. No closed model behind an API we couldn't swap, no proprietary release platform, no secret sauce.

Here's the entire stack:

Part

What it does

GitHub Actions

Orchestrates the whole release

OpenCode

Agent runtime that drives the model

An open-weights model (currently GLM-5.2 from Z.ai)

Drafts the release notes and Slack announcement

HF Inference Providers

Serves the model

PyPI Trusted Publishing

Publishes the package

The second principle: the model drafts, a human decides. Language models are good at turning thirty terse PR titles into readable release notes. They are not good at being trusted blindly. So the workflow is human-supervised: the model does the first pass, a deterministic script checks its work, and a human reviews and edits before anything ships (more on that below).

A tour of the pipeline

The full workflow is a single file, .github/workflows/release.yml, triggered by hand from the Actions UI. It takes exactly one input:

code

on:
  workflow_dispatch:
    inputs:
      release_type:
        type: choice
        options:
          - minor-prerelease   # cut an RC from main
          - minor-release      # promote the RC to final
          - patch-release      # bugfix on an existing release branch

From there, the jobs run roughly in this order:

Prepare. Compute the next version, create or reuse the release branch, bump __version__, commit, tag, push.

Publish to PyPI. Build and upload huggingface_hub. In parallel, build and upload the hf CLI as its own PyPI package.

Release notes. Diff the commit range since the last tag, pull PR metadata from the GitHub API, and have the model draft a structured changelog (here's a recent one). Saved as a draft GitHub release.

Downstream test branches. For RCs, open a branch in transformers, datasets, diffusers, sentence-transformers with the RC pinned, so their CI tells us fast if we broke something.

Slack announcement. Read the notes and produce an internal announcement in our team voice.

Archive notes. Upload both the raw AI draft and the human-edited version to a Hugging Face Bucket, side by side.

Post-release bump. After a stable release, open a PR on main bumping to the next dev0.

Comment on shipped PRs. Leave a "this shipped in vX.Y.Z" comment on every PR in the release.

Sync CLI docs. Open a PR to our skills repo with the regenerated hf CLI skill docs.

Report to Slack. Every step posts its status as a thread reply; a final job updates the root message with ✅ or ❌.

The remaining manual steps are reviewing and publishing the draft release notes, and reviewing and posting an internal Slack message. Those two steps are where we want a human in the loop.

Trust but verify: the human-in-the-loop core

Here's the failure mode everyone worries about with AI-generated release notes: the model quietly drops a PR or invents one that isn't in this release. A changelog that's almost right is worse than no changelog because nobody re-checks it.

We don't trust the generated release notes to be complete on the first try, we verify it deterministically. Before the model runs, a Python script retrieves all PRs that belong to the release and stores them as ground truth.

code

# Deterministic: extract PR numbers from squash-merge commits in the range.
PR_NUMBER_PATTERN = re.compile(r"\(#(\d+)\)$")

pr_numbers = [
    int(m.group(1))
    for commit in commits_since_last_tag
    if (m := PR_NUMBER_PATTERN.search(commit.title))
]
save_manifest(pr_numbers)  # the source of truth

Then model drafts the notes from them. Once done, we check its output against the initial list of PRs:

code

expected = set(load_manifest())          # what should be there
found    = extract_pr_refs(notes_md)     # what the model wrote (#1234 -> 1234)

missing = expected - found               # silently dropped
extra   = found - expected               # belongs to a different release

If anything is missing or extra, we don't fail and we don't ship a wrong file. We hand the discrepancy back to the agent and ask it to fix exactly those PRs:

code

for _ in range(MAX_ITERATIONS):
    missing, extra = validate(notes)
    if not missing and not extra:
        break  # matches the manifest exactly
    run_agent_fix(missing_prs=missing, extra_prs=extra)

This is the pattern that makes the whole thing trustworthy: a non-deterministic model wrapped in deterministic guardrails. The model is great at writing prose and unreliable at being exhaustive. So we let it write and let code enforce the consistency.

Grounding the model so it doesn't make things up

Completeness is one half. Accuracy is the other. A model summarizing a PR from its title alone will cheerfully invent a code example that doesn't match the real API.

To prevent that, when we fetch PR metadata we also pull the actual documentation diffs from each PR: the unified diff of any .md file under docs/ that the PR touched.

code

def fetch_doc_diffs(pr):
    return [
        {"filename": f.filename, "status": f.status, "patch": f.patch}
        for f in pr.get_files()
        if f.filename.startswith("docs/") and f.filename.endswith(".md") and f.patch
    ]

That diff goes into the model's context so when it writes "here's the new CLI command," it's quoting the example the PR author actually wrote in the docs. That's the same logic as before: give the model real source material and a narrow job.

The prompts themselves live as Skills: small Markdown files (SKILL.md plus reference templates) checked into the repo. The release-notes skill spells out how to pick highlights, how to structure sections, when to add a doc link, etc. It reads like onboarding instructions, which is exactly the right mental model.

The human checkpoint

After the RC is published, the draft GitHub release sits there with the AI's first pass in it. This is where the human comes in:

A reviewer reads the draft, edits for tone and emphasis, fixes anything the model over- or under-weighted.

Only then do they trigger the minor-release run, which promotes the RC to final.

The reviewer's time goes into polishing, turning a half-day of writing into a fifteen-minute editing session.

We also keep a paper trail to improve over time. We archive two files side by side to a Hugging Face Bucket: the raw AI draft, uploaded at RC time before anyone touches it, and the human-edited version, uploaded when the final release is cut.

code

# at RC time: straight from the model, untouched
hf cp release_notes_raw.txt    "hf://buckets/huggingface/releases/huggingface_hub/${V}/release_notes_raw.txt"

# at release time: after the human review
hf cp release_notes_edited.txt "hf://buckets/huggingface/releases/huggingface_hub/${V}/release_notes_edited.txt"

Collecting both every week gives us a growing dataset of "what the model wrote" versus "what we wished it wrote". Dataset that we can then reuse to update the agent's skill.

Open and secure plumbing

Revamping the release process was a good opportunity to tighten security, especially against supply-chain attacks.

No PyPI token. Publishing uses Trusted Publishing: PyPI verifies a short-lived OIDC token minted by GitHub for this exact workflow, and issues PEP 740 attestations / Sigstore provenance for every artifact. There's no long-lived secret to leak or rotate.

code

permissions:
  id-token: write       # mint the OIDC token for PyPI
  attestations: write   # generate Sigstore provenance
# ...
- uses: pypa/gh-action-pypi-publish@v1.14.0
  with:
    attestations: true  # no password, no API token, just OIDC

The agent runtime is pinned and verified. We don't curl | bash the latest OpenCode and hope. We pin a version and check its SHA256 before running it:

code

curl -fsSL https://opencode.ai/install | bash -s -- --version "${OPENCODE_VERSION}"
echo "${OPENCODE_SHA256}  $(which opencode)" | sha256sum -c -

Open tooling doesn't mean careless tooling.

So, what did it cost?

What changed in practice

The cadence went from one release every 4 to 6 weeks to once a week. The secondary effects were the interesting ones:

Notes got better, not worse. A first draft always exists, so review time goes to polishing. Grouping is more consistent and we omit fewer things.

Breakages surface earlier. Downstream test branches on every RC catch integration issues during the candidate window.

Contributor loops shortened. The automatic "shipped in vX.Y.Z" comment turned out to matter more than we expected. When someone reports an issue on a closed PR, everyone can immediately see which release the fix is in. That used to be a manual tag hunt.

Make it yours

This is the part we cared about most. The workflow is shaped around huggingface_hub but the structure is generic.

Reusable almost as-is:

The trigger and version-bump logic (minor-prerelease then minor-release then patch-release).

The trust-but-verify loop: deterministic manifest, model draft, validate, re-prompt. This is the transferable idea, independent of what you're generating.

OIDC Trusted Publishing, pinned and checksum-verified runtime, Slack threading.

The skill-based prompts: swap the templates, keep the structure.

Specific to us:

The downstream repo list and their dependency-pin formats.

The exact section taxonomy and tone in the skills.

The Slack and bucket destinations.

To adapt it: fork the workflow file and scripts, point it at your package, rewrite the skill Markdown for your project's voice, set two repo variables (the model ID and your OpenCode version), set up Trusted Publishing on PyPI, and delete the downstream-testing job if you don't have downstreams. The trust-but-verify loop is the part worth reusing as-is. It's what makes a generated artifact safe to ship.

What's next

Auto-triaging downstream failures. Today the workflow opens test branches and a human reads the CI. An obvious next step is to check the failing logs to report them in the internal slack message.

Extending the pattern. Most of this is generic. We expect to reuse large parts across other Python libraries in our ecosystem.

Takeaway

The parts of a release that used to need a half-day of focused human work (writing notes, drafting announcements, coordinating downstream checks) are the parts a model is good at drafting. Everything else is mechanical and fits in a YAML file. The trick was never just "let the AI do it". It's to let the model draft, let deterministic code verify, and let a human decide. It's built entirely from open tools and open weights so the cost rounds to zero and anyone can run it.

The full workflow file is public. If you maintain a Python library, fork it, adapt it, and let us know how it goes!

この記事をシェア

Vercel Blog★42026年6月17日 09:00

エージェント構築・運用・スケーリングのためのオープンソースフレームワーク「eve」の公開

Vercel が、ファイル構成でエージェントを定義し、耐久性実行やサンドボックス機能などを備えたオープンソースフレームワーク「eve」を一般プレビューとして発表した。

TechCrunch AI★32026年6月23日 01:51

SpaceX、オープンソース AI ラボの Reflection AI と計算リソース契約を締結

宇宙企業 SpaceX が、オープンソース AI 研究機関である Reflection AI と計算資源の利用に関する契約を結びました。

Hugging Face Blog★42026年6月22日 22:18

Hugging Face に PP-OCRv6 を公開：150 万パラメータから 3450 万パラメータへ拡張した 50 カ国語対応 OCR

Hugging Face が、PP-OCRv6 モデルを公開しました。このモデルは、パラメータ数を 150 万から 3450 万に増やすことで、50 の言語に対応する高精度な OCR（光学文字認識）機能を実現しています。

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む

Hugging Face Blog·2026年6月23日 09:00·約16分で読める

AI、オープンツール、人間をループに組み込みながら huggingface_hub を毎週リリース

#huggingface_hub #Open Source #Human-in-the-loop #DevOps #Software Maintenance

TL;DR

Hugging Face は、AI ツールの活用と人間のフィードバックをループに組み込むことで、huggingface_hub の毎週の更新プロセスを実現した。

AI深層分析2026年6月23日 17:04

注目/ 5段階

深度40%

キーポイント

人間と AI の協働による自動化

オープンツールの積極的活用

開発プロセスにおいてオープンソースのツールやリソースを最大限に活用し、効率的かつ透明性の高い更新サイクルを構築している。

継続的な品質保証と迅速な展開

人間の監視（Human-in-the-loop）を導入することで、自動化された高速な更新においても品質と信頼性を維持する仕組みを確立した。

影響分析・編集コメントを表示

影響分析

編集コメント

記事一覧に戻る

この記事を読み終える頃には、あなた自身で同様のものを構築するために必要なすべてが揃っているはずです。

私たちが始めた場所

以前のプロセスは部分的に自動化されていましたが、主に手作業でした。

すでに CI に組み込まれていたもの:

タグがプッシュされたら PyPI への公開。
リリース候補を固定した下流ライブラリでのテストブランチの作成。

毎回必ず手動で行っていたもの:

リリースブランチの作成、__init__.py 内のバージョン番号の更新、コミット、タグ付け、プッシュ。
下流 CI の実行状況の確認と失敗箇所のトリアージ。

RC 期間終了後に安定版リリースをカットする。

社内 Slack での発表文やソーシャルメディア投稿のドラフトを作成する。

リリース後の PR を開いて、main ブランチを次の dev0 バージョンに bump する。

2 つの種類の作業

そこで私たちは全体を合理化することにしました。そのリストを見ると、作業は 2 つに分類できます。

デザイン原則：オープンな部分、誰でも再利用可能

ここがスタック全体です：

Part

What it does

GitHub Actions

Orchestrates the whole release

OpenCode

Agent runtime that drives the model

An open-weights model (currently GLM-5.2 from Z.ai)

Drafts the release notes and Slack announcement

HF Inference Providers

Serves the model

PyPI Trusted Publishing

Publishes the package

パイプラインのツアー

完全なワークフローは単一のファイル .github/workflows/release.yml にあり、Actions UI から手動でトリガーされます。必要な入力はただ一つです：

on:

workflow_dispatch:

inputs:

release_type:

type: choice

options:

minor-prerelease # メインブランチから RC を作成する
minor-release # RC を最終版に昇格させる
patch-release # 既存のリリースブランチ上のバグ修正を行う

そこから、ジョブは概ね以下の順序で実行されます:

準備。次のバージョンを計算し、リリースブランチを作成または再利用し、__version__ を更新し、コミットし、タグ付けし、プッシュします。

PyPI への公開。huggingface_hub をビルドしてアップロードします。並行して、hf CLI を独立した PyPI パッケージとしてビルドしてアップロードします。

リリースノート。直近のタグ以降のコミット範囲を差分取り、GitHub API から PR メタデータを取得し、モデルに構造化された変更履歴（最近のものはこちら）を作成させます。ドラフト GitHub リリースとして保存されます。

下流テストブランチ。RC の場合、transformers、datasets、diffusers、sentence-transformers に RC を固定したブランチを開き、それぞれの CI で何か壊れていないか迅速に確認できるようにします。

Slack での告知。ノートを読み取り、チームのトーンで内部告知を作成します。

ノートのアーカイブ。生の AI ドラフト版と人間が編集した版を並べて Hugging Face Bucket にアップロードします。

リリース後のバージョン更新。安定リリース後、メインブランチに次の dev0 へのバージョン更新 PR を作成します。

公開された PR へのコメント。リリースに含まれるすべての PR に「vX.Y.Z で公開されました」というコメントを残します。

CLI ドキュメントの同期。再生成された hf CLI スキルドキュメントを skills リポジトリにプッシュする PR を作成します。

Slack に報告。各ステップのステータスはスレッド返信として投稿され、最終ジョブは✅または❌でルートメッセージを更新します。

信頼せよ、しかし検証せよ：人間在ループの中核

決定論的：範囲内の squash-merge コミットから PR 番号を抽出。

PR_NUMBER_PATTERN = re.compile(r"$#(\d+)$$")

pr_numbers = [

int(m.group(1))

for commit in commits_since_last_tag

if (m := PR_NUMBER_PATTERN.search(commit.title))

]

save_manifest(pr_numbers) # 真のソース

その後、モデルがこれらからノートドラフトを作成します。完了後、その出力を初期の PR リストと比較して検証します：

expected = set(load_manifest()) # あるべきもの

found = extract_pr_refs(notes_md) # モデルが記述したもの (#1234 -> 1234)

missing = expected - found # シームレスにドロップされる

extra = found - expected # 異なるリリースに属する

for _ in range(MAX_ITERATIONS):

missing, extra = validate(notes)

if not missing and not extra:

break # マニフェストと完全に一致

run_agent_fix(missing_prs=missing, extra_prs=extra)

モデルが作り話をしないように接地する

def fetch_doc_diffs(pr):

return [

{"filename": f.filename, "status": f.status, "patch": f.patch}

for f in pr.get_files()

if f.filename.startswith("docs/") and f.filename.endswith(".md") and f.patch

]

人間のチェックポイント

RC が公開された後、ドラフト版の GitHub リリースには AI による最初の通し稿が格納されます。ここで人間が介入します：

レビューアがドラフトを読み、トーンや強調点を調整し、モデルが過大評価または過小評価した箇所を修正します。

その後初めて、マイナーリリース実行トリガーが発動され、RC が最終版へ昇格します。

レビューアの時間は、執筆に半日かかる作業を 15 分程度の編集セッションにまとめるための研磨に充てられます。

RC 時点：モデルから直接、未修正のまま

hf cp release_notes_raw.txt "hf://buckets/huggingface/releases/huggingface_hub/${V}/release_notes_raw.txt"

リリース時点：人間のレビュー後

hf cp release_notes_edited.txt "hf://buckets/huggingface/releases/huggingface_hub/${V}/release_notes_edited.txt"

オープンで安全な基盤整備

リリースプロセスの見直しは、特にサプライチェーン攻撃に対するセキュリティ強化の好機となりました。

permissions:

id-token: write # PyPI 用の OIDC トークンをミントする

attestations: write # Sigstore 由来情報を生成する

...

uses: pypa/gh-action-pypi-publish@v1.14.0

with:

attestations: true # パスワードなし、API トークンなし、OIDC のみ

curl -fsSL https://opencode.ai/install | bash -s -- --version "${OPENCODE_VERSION}"

echo "${OPENCODE_SHA256} $(which opencode)" | sha256sum -c -

Open tooling doesn't mean careless tooling.

So, what did it cost?

What changed in practice

The cadence went from one release every 4 to 6 weeks to once a week. The secondary effects were the interesting ones:

Notes got better, not worse. A first draft always exists, so review time goes to polishing. Grouping is more consistent and we omit fewer things.

Breakages surface earlier. Downstream test branches on every RC catch integration issues during the candidate window.

Contributor loops shortened. The automatic "shipped in vX.Y.Z" comment turned out to matter more than we expected. When someone reports an issue on a closed PR, everyone can immediately see which release the fix is in. That used to be a manual tag hunt.

Make it yours

This is the part we cared about most. The workflow is shaped around huggingface_hub but the structure is generic.

Reusable almost as-is:

The trigger and version-bump logic (minor-prerelease then minor-release then patch-release).

信頼して検証するループ：決定論的なマニフェスト、モデル草案の作成、検証、再プロンプト。これは生成物に関係なく適用可能なアイデアです。

OIDC による信頼できる公開（OIDC Trusted Publishing）、固定化されチェックサム検証済みのランタイム、Slack のスレッド機能。

スキルベースのプロンプト：テンプレートを入れ替え、構造は維持する。

私たちにとっての固有要素:

下流リポジトリ一覧と、それらの依存関係固定フォーマット。

スキルにおける正確なセクション分類体系とトーン。

Slack およびバケットの宛先。

次のステップ

下流の失敗を自動仕分けすること。現在はワークフローがテストブランチを開き、人間が CI を確認しています。明らかな次のステップは、 failing logs（失敗ログ）を確認し、内部 Slack メッセージで報告することです。

パターンの拡張。これの大部分は汎用的であり、当社のエコシステム内の他の Python ライブラリ間でも大規模な部分を再利用できることを期待しています。

結論

原文を表示

Back to Articles

By the end of this post, you'll have everything you need to build your own.

Where we started

The old process was partly automated, mostly manual.

Already in CI:

Publishing to PyPI once a tag was pushed.

Opening test branches in downstream libraries with the release candidate pinned.

Still manual, every single time:

Creating the release branch, bumping the version in __init__.py, committing, tagging, pushing.

Watching the downstream CI runs and triaging failures.

Reading through every PR merged since the last release and writing release notes by hand: grouped by theme, with context, in a voice that didn't read like a git log dump.

Cutting the stable release after the RC period.

Drafting an internal Slack announcement and social posts.

Opening the post-release PR to bump main to the next dev0.

Two kinds of work

So we decided to streamline the whole thing. Looking at that list, the work splits in two.

The design principle: open parts, reusable by anyone

Here's the entire stack:

Part

What it does

GitHub Actions

Orchestrates the whole release

OpenCode

Agent runtime that drives the model

An open-weights model (currently GLM-5.2 from Z.ai)

Drafts the release notes and Slack announcement

HF Inference Providers

Serves the model

PyPI Trusted Publishing

Publishes the package

A tour of the pipeline

The full workflow is a single file, .github/workflows/release.yml, triggered by hand from the Actions UI. It takes exactly one input:

code

on:
  workflow_dispatch:
    inputs:
      release_type:
        type: choice
        options:
          - minor-prerelease   # cut an RC from main
          - minor-release      # promote the RC to final
          - patch-release      # bugfix on an existing release branch

From there, the jobs run roughly in this order:

Prepare. Compute the next version, create or reuse the release branch, bump __version__, commit, tag, push.

Publish to PyPI. Build and upload huggingface_hub. In parallel, build and upload the hf CLI as its own PyPI package.

Release notes. Diff the commit range since the last tag, pull PR metadata from the GitHub API, and have the model draft a structured changelog (here's a recent one). Saved as a draft GitHub release.

Downstream test branches. For RCs, open a branch in transformers, datasets, diffusers, sentence-transformers with the RC pinned, so their CI tells us fast if we broke something.

Slack announcement. Read the notes and produce an internal announcement in our team voice.

Archive notes. Upload both the raw AI draft and the human-edited version to a Hugging Face Bucket, side by side.

Post-release bump. After a stable release, open a PR on main bumping to the next dev0.

Comment on shipped PRs. Leave a "this shipped in vX.Y.Z" comment on every PR in the release.

Sync CLI docs. Open a PR to our skills repo with the regenerated hf CLI skill docs.

Report to Slack. Every step posts its status as a thread reply; a final job updates the root message with ✅ or ❌.

The remaining manual steps are reviewing and publishing the draft release notes, and reviewing and posting an internal Slack message. Those two steps are where we want a human in the loop.

Trust but verify: the human-in-the-loop core

code

# Deterministic: extract PR numbers from squash-merge commits in the range.
PR_NUMBER_PATTERN = re.compile(r"\(#(\d+)\)$")

pr_numbers = [
    int(m.group(1))
    for commit in commits_since_last_tag
    if (m := PR_NUMBER_PATTERN.search(commit.title))
]
save_manifest(pr_numbers)  # the source of truth

Then model drafts the notes from them. Once done, we check its output against the initial list of PRs:

code

expected = set(load_manifest())          # what should be there
found    = extract_pr_refs(notes_md)     # what the model wrote (#1234 -> 1234)

missing = expected - found               # silently dropped
extra   = found - expected               # belongs to a different release

If anything is missing or extra, we don't fail and we don't ship a wrong file. We hand the discrepancy back to the agent and ask it to fix exactly those PRs:

code

for _ in range(MAX_ITERATIONS):
    missing, extra = validate(notes)
    if not missing and not extra:
        break  # matches the manifest exactly
    run_agent_fix(missing_prs=missing, extra_prs=extra)

Grounding the model so it doesn't make things up

Completeness is one half. Accuracy is the other. A model summarizing a PR from its title alone will cheerfully invent a code example that doesn't match the real API.

To prevent that, when we fetch PR metadata we also pull the actual documentation diffs from each PR: the unified diff of any .md file under docs/ that the PR touched.

code

def fetch_doc_diffs(pr):
    return [
        {"filename": f.filename, "status": f.status, "patch": f.patch}
        for f in pr.get_files()
        if f.filename.startswith("docs/") and f.filename.endswith(".md") and f.patch
    ]

The human checkpoint

After the RC is published, the draft GitHub release sits there with the AI's first pass in it. This is where the human comes in:

A reviewer reads the draft, edits for tone and emphasis, fixes anything the model over- or under-weighted.

Only then do they trigger the minor-release run, which promotes the RC to final.

The reviewer's time goes into polishing, turning a half-day of writing into a fifteen-minute editing session.

code

# at RC time: straight from the model, untouched
hf cp release_notes_raw.txt    "hf://buckets/huggingface/releases/huggingface_hub/${V}/release_notes_raw.txt"

# at release time: after the human review
hf cp release_notes_edited.txt "hf://buckets/huggingface/releases/huggingface_hub/${V}/release_notes_edited.txt"

Collecting both every week gives us a growing dataset of "what the model wrote" versus "what we wished it wrote". Dataset that we can then reuse to update the agent's skill.

Open and secure plumbing

Revamping the release process was a good opportunity to tighten security, especially against supply-chain attacks.

code

permissions:
  id-token: write       # mint the OIDC token for PyPI
  attestations: write   # generate Sigstore provenance
# ...
- uses: pypa/gh-action-pypi-publish@v1.14.0
  with:
    attestations: true  # no password, no API token, just OIDC

The agent runtime is pinned and verified. We don't curl | bash the latest OpenCode and hope. We pin a version and check its SHA256 before running it:

code

curl -fsSL https://opencode.ai/install | bash -s -- --version "${OPENCODE_VERSION}"
echo "${OPENCODE_SHA256}  $(which opencode)" | sha256sum -c -

Open tooling doesn't mean careless tooling.

So, what did it cost?

What changed in practice

The cadence went from one release every 4 to 6 weeks to once a week. The secondary effects were the interesting ones:

Notes got better, not worse. A first draft always exists, so review time goes to polishing. Grouping is more consistent and we omit fewer things.

Breakages surface earlier. Downstream test branches on every RC catch integration issues during the candidate window.

Contributor loops shortened. The automatic "shipped in vX.Y.Z" comment turned out to matter more than we expected. When someone reports an issue on a closed PR, everyone can immediately see which release the fix is in. That used to be a manual tag hunt.

Make it yours

This is the part we cared about most. The workflow is shaped around huggingface_hub but the structure is generic.

Reusable almost as-is:

The trigger and version-bump logic (minor-prerelease then minor-release then patch-release).

The trust-but-verify loop: deterministic manifest, model draft, validate, re-prompt. This is the transferable idea, independent of what you're generating.

OIDC Trusted Publishing, pinned and checksum-verified runtime, Slack threading.

The skill-based prompts: swap the templates, keep the structure.

Specific to us:

The downstream repo list and their dependency-pin formats.

The exact section taxonomy and tone in the skills.

The Slack and bucket destinations.

What's next

Auto-triaging downstream failures. Today the workflow opens test branches and a human reads the CI. An obvious next step is to check the failing logs to report them in the internal slack message.

Extending the pattern. Most of this is generic. We expect to reuse large parts across other Python libraries in our ecosystem.

Takeaway

The full workflow file is public. If you maintain a Python library, fork it, adapt it, and let us know how it goes!

この記事をシェア

Vercel Blog★42026年6月17日 09:00

エージェント構築・運用・スケーリングのためのオープンソースフレームワーク「eve」の公開

TechCrunch AI★32026年6月23日 01:51

SpaceX、オープンソース AI ラボの Reflection AI と計算リソース契約を締結

宇宙企業 SpaceX が、オープンソース AI 研究機関である Reflection AI と計算資源の利用に関する契約を結びました。

Hugging Face Blog★42026年6月22日 22:18

Hugging Face に PP-OCRv6 を公開：150 万パラメータから 3450 万パラメータへ拡張した 50 カ国語対応 OCR

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む

キーポイント

影響分析

編集コメント

私たちが始めた場所

2 つの種類の作業

デザイン原則：オープンな部分、誰でも再利用可能

パイプラインのツアー

信頼せよ、しかし検証せよ：人間在ループの中核

決定論的：範囲内の squash-merge コミットから PR 番号を抽出。

モデルが作り話をしないように接地する

人間のチェックポイント

RC 時点：モデルから直接、未修正のまま

リリース時点：人間のレビュー後

オープンで安全な基盤整備

...

So, what did it cost?

What changed in practice

Make it yours

次のステップ

結論

Where we started

Two kinds of work

The design principle: open parts, reusable by anyone

A tour of the pipeline

Trust but verify: the human-in-the-loop core

Grounding the model so it doesn't make things up

The human checkpoint

Open and secure plumbing

So, what did it cost?

What changed in practice

Make it yours

What's next

Takeaway

関連記事

キーポイント

影響分析

編集コメント

私たちが始めた場所

2 つの種類の作業

デザイン原則：オープンな部分、誰でも再利用可能

パイプラインのツアー

信頼せよ、しかし検証せよ：人間在ループの中核

決定論的：範囲内の squash-merge コミットから PR 番号を抽出。

モデルが作り話をしないように接地する

人間のチェックポイント

RC 時点：モデルから直接、未修正のまま

リリース時点：人間のレビュー後

オープンで安全な基盤整備

...

So, what did it cost?

What changed in practice

Make it yours

次のステップ

結論

Where we started

Two kinds of work

The design principle: open parts, reusable by anyone

A tour of the pipeline

Trust but verify: the human-in-the-loop core

Grounding the model so it doesn't make things up

The human checkpoint

Open and secure plumbing

So, what did it cost?

What changed in practice

Make it yours

What's next

Takeaway

関連記事