InfoQ·2026年4月17日 02:05·約4分で読める

Google、マルチモーダルとエージェント機能を備えたGemma 4をApache 2.0で公開

#オープンソースAI #LLM #マルチモーダル #エージェントAI #Google #Apache 2.0

TL;DR

GoogleはApache 2.0ライセンスの下で、最大31BパラメータのオープンウェイトAIモデルシリーズ「Gemma 4」をリリースし、マルチモーダル処理やエージェント機能、最大256Kトークンのコンテキストウィンドウを特徴としている。

AI深層分析2026年4月17日 03:44

重要/ 5段階

深度40%

キーポイント

オープンソースライセンスでの公開

Gemma 4モデルシリーズがApache 2.0ライセンスの下で公開され、商用利用を含む幅広い活用が可能になった。

多様なモデルサイズ

2B、4B、26B、31Bパラメータの複数のバリアントが提供され、異なる計算リソースやユースケースに対応できる。

マルチモーダル機能の強化

ビデオ・画像処理の強化、小型モデルでの音声入力対応など、複数モダリティの処理能力が向上している。

拡張されたコンテキストウィンドウ

最大256Kトークンの長いコンテキストを処理可能で、長文ドキュメントや複雑な対話への対応が期待される。

影響分析・編集コメントを表示

影響分析

このリリースは、GoogleがオープンソースAIモデル市場での存在感を強化する戦略を示しており、Apache 2.0ライセンスによる商用利用の容易さから、企業や開発者による実装が加速する可能性がある。特にマルチモーダル機能と長いコンテキスト対応は、実用的なアプリケーション開発に直接寄与する重要な進歩と言える。

編集コメント

Apache 2.0ライセンスでの公開は商用利用の障壁を大幅に下げ、オープンソースLLM競争を激化させる重要な動き。マルチモーダル機能と長いコンテキストの組み合わせは実用性が高く、開発者コミュニティの反応が注目される。

Googleは最近、Gemma 4をリリースしました。これはオープンウェイトモデル（open-weight models）のファミリーで、実効パラメータ数が2Bおよび4Bのエッジバリエーション（edge variants）、26BのMixture-of-Experts（MoE）モデル、31Bのデンスモデル（dense model）を含み、すべてApache 2.0ライセンスの下で配布されています。今回のリリースでは、ラインアップ全体にわたるネイティブの動画および画像処理、小規模モデルでの音声入力、最大256Kトークンのコンテキストウィンドウ（context window）が導入され、ベンチマーク結果では31Bデンスバリエーションが通常は自身の3〜5倍のサイズを持つモデルが占める性能帯域に位置づけられています。

エージェント指向のフレームワーク（agent-oriented framing）は、具体的な機能に反映されています。Googleによると、31BバリエーションはGPQA Diamondで84.3%、LiveCodeBench v6で80.0%のスコアを記録しています。GPQA Diamondの結果は、以前のGemma 3 IT 27Bが達成した42.4%のほぼ倍に達しており、科学推論およびコード生成における大幅な向上を示しています。ツール利用に関しては、モデルは関数呼び出し（function-calling）、構造化JSON出力、ネイティブシステム指示のサポートを追加し、これにより開発者は外部ツールやAPIと連携し、マルチステップのワークフローを確実に実行する自律型エージェント（autonomous agents）を構築できるようになります。

アーキテクチャ面では、ラインアップはデンス設計とスパース設計（sparse designs）の両方をカバーしています。26B MoEモデルは推論（inference）時に38億パラメータのみを活性化し、高速なトークン/秒（tokens-per-second）を実現します。一方、31Bデンスバリエーションは、最大パラメータ数よりも一貫したトークンあたりのコストが重要となるワークロードを対象としています。メモリと電力予算が限られるモバイルおよびIoTデバイス（IoT devices）向けに設計されたエッジモデルは128Kコンテキストウィンドウを提供し、大規模モデルは最大256Kトークンまで拡張され、これにより単一のプロンプトで大容量のコードリポジトリ（code repositories）や長文書を取り込むことができます。4つのバリエーションすべてが可変解像度の動画および画像をネイティブに処理し、E2BおよびE4Bエッジモデルは音声認識と理解のためのネイティブ音声入力を追加し、このファミリーは140以上の言語でトレーニングされています。

ベンチマークにおいて、Googleは31Bデンスモデルが推定LLMArenaスコア（LLMArena score）1452を達成したと報告しており、これは通常パラメータ数が3〜5倍の大幅に大きなモデルに割り当てられる性能帯域に達するものです。

出典：Googleブログ

オープンモデルコミュニティからの反応は、純粋なスコアよりも使いやすさと新しいライセンス方針に焦点を当てています。Sam WitteveenはApache 2.0ライセンスを称賛しました。

これは本物のApache 2ライセンスであり、つまり初めてGoogleの最高峰オープンモデルを入手し、修正し、ファインチューニング（fine-tune）し、商業的にデプロイし、自由に扱うことができるということです。条件は一切ありません（No strings attached）。

Nathan Lambertは、Gemma 4の価値はそのシームレスな統合（frictionless integration）にあると主張し、次のように述べています。

Gemma 4の成功は完全に使いやすさによって決まり、ベンチマークで5〜10%の誤差が生じても全く問題なくなるレベルです。十分に強力であり、十分に小型で、適切なライセンスを持ち、米国由来であるため、多くの企業がこれをシステムに組み込むでしょう。

リリース直後の配布（Day-zero distribution）は非常に広範にわたっています：重み（weights）はHugging FaceおよびKaggleで入手可能であり、vLLM、llama.cpp、Ollama、MLX、LM Studio、Unsloth、SGLang、NVIDIA NIMを通じた参照パスに加え、NVIDIA Model Optimizerを使用したNVFP4量子化（quantized）の31Bチェックポイントも提供されています。Kaggleでは「Gemma 4 Good Challenge」が開催されており、開発者に対して新モデルを活用して意味のある前向きな変化をもたらす製品の構築を呼びかけています。

著者について

Hien LuuはZooxのシニアエンジニアリングマネージャーであり、マシンラーニングプラットフォーム（Machine Learning Platform）チームを率いています。彼は現実世界のアプリケーションを支えるスケーラブルなAI/MLインフラストラクチャ（AI/ML infrastructure）の構築に特に情熱を注いでいます。『MLOps with Ray』および『Beginning Apache Spark 3』の著者であり、MLOps World、QCon（SF, NY, London）、GHC 2022、Data+AI Summit、XAI 21 Summit、YOW Data!、appy()などの各種カンファレンスでプレゼンテーションを実施しています。

原文を表示

Google has recently released Gemma 4, a family of open-weight models spanning effective 2B and 4B edge variants, a 26B Mixture-of-Experts model, and a 31B dense model, all distributed under an Apache 2.0 license. The release introduces native video and image processing across the lineup, audio input on the smaller models, context windows up to 256K tokens, and benchmark results that place the 31B dense variant in a bracket typically occupied by models three to five times its size.

The agent-oriented framing is reflected in concrete capabilities. Google reports that the 31B variant scores 84.3% on GPQA Diamond and 80.0% on LiveCodeBench v6. The GPQA Diamond result nearly doubles the 42.4% achieved by the prior Gemma 3 IT 27B, reflecting substantial gains in science reasoning and code generation. For tool use, the models add native support for function-calling, structured JSON output, and native system instructions, a combination intended to let developers build autonomous agents that interact with external tools and APIs and execute multi-step workflows reliably.

Architecturally, the lineup spans both dense and sparse designs. The 26B MoE model activates only 3.8 billion parameters during inference to deliver fast tokens-per-second, while the 31B dense variant targets workloads where consistent per-token cost matters more than peak parameter count. The edge models, sized for mobile and IoT devices where memory and power budgets are tight, offer a 128K context window; the larger models extend to 256K tokens, large enough to ingest sizable code repositories or long-form documents in a single prompt. All four variants natively process video and images at variable resolutions, and the E2B and E4B edge models add native audio input for speech recognition and understanding, and the family is trained on more than 140 languages.

On benchmarks, Google reports that the 31B dense model achieved an estimated LLMArena score (text only) of 1452, reaching a performance bracket usually reserved for significantly larger models with triple to quintuple the parameter count.

Source: Google blog

Reactions in the open-model community have focused less on raw scores and more on usability and new licensing. Sam Witteveen applauded the Apache 2.0 license.

This is an actual real Apache 2 license, which means for the first time, you can take Google's best open model, modify it, fine tune it, deploy it commercially, do whatever you want with it. No strings attached

Nathan Lambert argues that Gemma 4’s value lies in its frictionless integration, noting:

Gemma 4’s success is going to be entirely determined by ease of use, to a point where a 5-10% swing on benchmarks wouldn’t matter at all. It’s strong enough, small enough, with the right license, and from the U.S., so many companies are going to slot it in.

Day-zero distribution is notably broad: weights are available on Hugging Face, Kaggle, with reference paths through vLLM, llama.cpp, Ollama, MLX, LM Studio, Unsloth, SGLang, and NVIDIA NIM, plus an NVFP4 quantized 31B checkpoint using NVIDIA Model Optimizer. Kaggle is running Gemma 4 Good Challenge, inviting developers to build products that create meaningful positive change using the new models.

About the Author

Hien Luu

Hien Luu is a Sr. Engineering Manager at Zoox, leading the Machine Learning Platform team. He is particularly passionate about building scalable AI/ML infrastructure to power real-world applications. He is the author of MLOps with Ray and the Beginning Apache Spark 3 book. He has given presentations at various conferences such as MLOps World, QCon (SF,NY, London), GHC 2022, Data+AI Summit, XAI 21 Summit, YOW Data!, appy()

Show moreShow less

この記事をシェア

Hugging Face Blog★42026年4月23日 09:00

Chrome拡張機能でTransformers.jsを使用する方法

開発者はChrome拡張機能にTransformers.jsを組み込み、ブラウザ上で機械学習モデルを実行する。これによりサーバー依存を排除し、プライバシー保護と低レイテンシを実現する実装手順を示す。

InfoQ★32026年4月24日 00:00

Google、Room 3.0を発表：Kotlinファーストの非同期マルチプラットフォーム永続化ライブラリ

GoogleはRoom 3.0を発表した。本バージョンは破壊的変更を導入し、Kotlin Multiplatform対応を強化するとともにJSとWasmへのサポートを追加した。

Simon Willison Blog2026年4月16日 01:41

Google の Gemini 3.1 Flash TTS モデルによる自然な音声合成ツール

Google は、単一話者および複数話者の会話モードに対応し、発声指示タグの適用も可能な「Gemini 3.1 Flash TTS」モデルを公開した。このツールにより、テキストから自然な音声を生成してダウンロードできるようになった。

ニュース一覧に戻る元記事を読む