GoogleのGemma 4が初めてApache 2.0ライセンスで利用可能に
Googleは、スマートフォンからワークステーションまで幅広いデバイスで動作し、初めて完全にオープンなApache 2.0ライセンスで提供される、これまでで最も高性能なオープンモデルファミリー「Gemma 4」をリリースした。
キーポイント
Gemma 4のリリース
Googleがこれまでで最も高性能なオープンモデルファミリー「Gemma 4」をリリースした。
Apache 2.0ライセンスの採用
Gemmaモデルが初めて完全にオープンなApache 2.0ライセンスの下で提供されるようになった。
幅広いデバイス対応
4つの新モデルは、スマートフォンからワークステーションまで、幅広いデバイスで動作する。
影響分析・編集コメントを表示
影響分析
このリリースは、高性能AIモデルのオープンソース化をさらに推進し、開発者や企業によるカスタマイズと商用利用の障壁を大幅に下げる。Apache 2.0ライセンスの採用は、オープンソースAIエコシステムの成長と多様なデバイスへのAI導入を加速させる可能性が高い。
編集コメント
Apache 2.0ライセンスの採用は、商用利用を含む柔軟な活用を可能にし、オープンソースAIモデルの実用化における重要なマイルストーンと言える。

Googleは、これまでで最も高性能なオープンモデルファミリーであるGemma 4をリリースしました。4つの新モデルはスマートフォンからワークステーションまであらゆる環境で動作し、初めて完全にオープンなApache 2.0ライセンスの下で提供されます。
本記事「GoogleのGemma 4が初めてApache 2.0ライセンスで利用可能に」は、The Decoderで最初に公開されました。
原文を表示
Google is releasing Gemma 4, its most capable open model family yet. The four new models run on everything from smartphones to workstations and ship under a fully open Apache 2.0 license for the first time.
The models are based on the same technology as Google's proprietary Gemini 3 and are published under the commercially permissive Apache 2.0 license, giving developers full control over their data, infrastructure, and models. Earlier Gemma versions shipped under a more restrictive Google proprietary license.
All Gemma 4 models bring significant improvements in multi-step reasoning and math tasks, according to Google. For agentic workflows, they natively support function calling, structured JSON output, and system instructions, letting autonomous agents tap into various tools and APIs.
Four model sizes cover everything from edge devices to workstations
Gemma 4 comes in four sizes: Effective 2B (E2B), Effective 4B (E4B), a 26B Mixture-of-Experts (MoE) model, and a 31B Dense model. All four go beyond simple chat and handle complex logic and agentic workflows.
E2B
E4B
26B MoE
31B Dense
Active parameters
"effective" 2 billion
"effective" 4 billion
3.8 billion active
-
Architecture
-
-
MoE
Dense
Context window
128K tokens
128K tokens
up to 256K tokens
up to 256K tokens
Target hardware
Smartphones, Raspberry Pi, Jetson Orin Nano
Smartphones, Raspberry Pi, Jetson Orin Nano
Personal computers, consumer GPUs (quantized), workstations, accelerators
Personal computers, consumer GPUs (quantized), workstations, accelerators
Offline operation
✅
✅
✅
✅
Vision (images/video)
✅
✅
✅
✅
Audio input
✅
✅
-
-
Quantized on consumer GPU
-
-
✅
✅
Arena AI ranking (open)
-
-
#6
#3
Special feature
Compute and memory efficiency on edge devices
Compute and memory efficiency on edge devices
Optimized for latency, 3.8 billion active parameters, fast token generation
Maximum quality, base for fine-tuning
The 31B model currently sits at 3rd place among all open models worldwide on the Arena AI Text Leaderboard, while the 26B MoE model ranks 6th. Google says Gemma 4 outperforms models 20 times its size. For developers, that translates to high-performance results with significantly lower hardware requirements.
Benchmark
Gemma 4 31B IT Thinking
Gemma 4 26B A4B IT Thinking
Gemma 4 E4B IT Thinking
Gemma 4 E2B IT Thinking
Gemma 3 27B IT
Arena AI (text) (As of 4/2/26)
1452
1441
-
-
1365
MMLU (Multilingual Q&A)
No tools
85.2%
82.6%
69.4%
60.0%
67.6%
MMMU Pro (Multimodal reasoning)
76.9%
73.8%
52.6%
44.2%
49.7%
AIME 2026 (Mathematics)
No tools
89.2%
88.3%
42.5%
37.5%
20.8%
LiveCodeBench v6 (Competitive coding problems)
80.0%
77.1%
52.0%
44.0%
29.1%
GPQA Diamond (Scientific knowledge)
No tools
84.3%
82.3%
58.6%
43.4%
42.4%
τ2-bench (Agentic tool use)
Retail
86.4%
85.5%
57.5%
29.4%
6.6%
The two larger models target workstations and servers. The unquantized bfloat16 weights of the 31B model fit on a single 80 GB NVIDIA H100 GPU, and quantized versions should run on consumer graphics cards too.
The 26B MoE model only activates 3.8 billion of its parameters during inference, which should make for especially fast token generation. The 31B dense model aims for maximum quality instead and is meant to serve as a foundation for fine-tuning.
Google's Gemma 4 models score above 1,440 Elo on the Arena AI Leaderboard despite having just 26B and 31B parameters—far smaller than many competitors with hundreds of billions of parameters. | Image: Google
The smaller E2B and E4B models are purpose-built for mobile devices and IoT hardware. They only activate two and four billion parameters, respectively, during inference to save memory and battery life. Both edge models natively process images, video, and audio input for speech recognition. Their context window covers 128,000 tokens, while the larger models can handle up to 256,000 tokens.
Independent benchmarks from Artificial Analysis back up the numbers for the larger Gemma 4 models. On the GPQA Diamond benchmark for scientific reasoning, Gemma 4 31B scores 85.7 percent in reasoning mode. According to Artificial Analysis, that's the second-best result among all open models with fewer than 40 billion parameters, just behind Qwen3.5 27B at 85.8 percent. At around 1.2 million output tokens, Gemma 4 31B likely also needs less compute than Qwen3.5 27B (1.5 million) and Qwen3.5 35B A3B (1.6 million).
On the GPQA Diamond benchmark, the Gemma 4 models land in the top performance tier with 26B and 31B parameters, outperforming significantly larger models like gpt-oss-120B. | Image: Artificial Analysis
The 26B MoE model scores 79.2 percent on the same benchmark, putting it ahead of OpenAI's gpt-oss-120B at 76.2 percent but behind Qwen3.5 9B at 80.6 percent. Artificial Analysis notes that both evaluated models run on a single H100 GPU. The full evaluation of all four Gemma 4 models in the Artificial Analysis Intelligence Index is still pending. As always, benchmark numbers only go so far when it comes to predicting real-world performance.
Where to get Gemma 4 and what platforms it supports
Gemma 4 is available now on Hugging Face, Kaggle, and Ollama. Google AI Studio supports the 31B and 26B models, while Google AI Edge Gallery handles the E4B and E2B variants.
At launch, the models work with a wide range of frameworks and platforms, including Hugging Face Transformers, vLLM, llama.cpp, MLX, Ollama, NVIDIA NIM and NeMo, LM Studio, Unsloth, SGLang, Keras, and others. Fine-tuning works through Google Colab, Vertex AI, or local gaming GPUs. For production deployments, the models scale to Google Cloud via Vertex AI, Cloud Run, and GKE.
On the hardware side, Google says Gemma 4 supports NVIDIA hardware from the Jetson Orin Nano all the way up to Blackwell GPUs, AMD GPUs through the ROCm stack, and Google's own Trillium and Ironwood TPUs.
AI News Without the Hype – Curated by Humans
Subscribe to THE DECODER for ad-free reading, a weekly AI newsletter, our exclusive "AI Radar" frontier report six times a year, full archive access, and access to our comment section.
Subscribe now
関連記事
Gemma 4:バイト単位で最も能力の高いオープンモデル
GoogleがGemma 4を発表した。高度な推論とエージェントワークフロー向けに設計された、これまでで最も知的なオープンモデルである。
Google、オープンモデルファミリーGemma 4を発表
Googleは、高度な推論とマルチモーダル機能を備えたオープンモデルファミリー「Gemma 4」を発表した。
Googleの研究が発見:AIベンチマークは人間の意見の相違を体系的に無視している
Googleの研究チームが、AIベンチマークで標準的に使用される3〜5人の人間評価者では信頼性が不十分であり、アノテーション予算の配分方法が予算規模と同様に重要だと指摘した。