KDnuggets·2026年5月14日 21:00·約12分

エージェント型ツール呼び出しのための小型言語モデル 5 つ

#Small Language Models #Agentic AI #Tool Calling #Open Source LLMs #Hugging Face

TL;DR

KDnuggets は、コストとレイテンシの課題を解決し、エージェンティックなツール呼び出しに特化した 5 つの小型言語モデル（SLM）を紹介し、特に Hugging Face の SmolLM3-3B が長文脈と多言語対応で注目されていると報じている。

AI深層分析2026年7月5日 03:07

重要/ 5段階

深度40%

キーポイント

大型モデルの代替としての SLM の台頭

ChatGPT や Claude などの大規模モデルはコストやハードウェア要件が高く、多くの実世界展開には不向きだが、小型言語モデル（SLM）がそのギャップを埋めつつある。

SmolLM3-3B の詳細な技術仕様

Hugging Face が開発した 3B パラメータの SmolLM3 は、64K（拡張で 128K）のコンテキスト長、6 か国語対応、および JSON/XML/Python 形式でのネイティブなツール呼び出し機能を備えている。

エージェンティック AI の実用化への貢献

これらのモデルは、適切な関数の選択や引数のフォーマット、マルチステップワークフローへの統合を信頼性高く行えるよう設計されており、データセンター不要でローカル実行が可能である。

オープンウェイトとライセンスの利点

紹介されたモデルは Hugging Face で公開されるオープンウェイトであり、Apache 2.0 ライセンスなどの寛容な利用条件により、開発者が自由にデプロイ・カスタマイズできる。

Qwen3-4B-Instruct の長文コンテキストと多言語対応

このモデルはネイティブで26万トークンのコンテキスト長をサポートしており、100以上の言語に対応しています。

効率的なアーキテクチャとライセンス

GQA（Grouped Query Attention）を採用し、Apache 2.0ライセンスで商用利用可能なオープンソースモデルです。

Phi-3-mini の基本仕様とトレーニング手法

Microsoft が開発した 3.8B パラメータのモデルで、合成データとフィルタリングされたウェブデータを組み合わせてトレーニングされ、SFT と DPO を経て最適化されています。

影響分析・編集コメントを表示

影響分析

この記事は、大規模言語モデルに依存する従来のアプローチから、コスト効率と実行速度を重視した小型モデルへのシフトを促す重要な指針を示しています。特に SmolLM3-3B のような高性能な SLM が登場することで、リソース制約のある環境でも高度なエージェンティック AI を構築・展開できる道が開け、業界全体のデプロイ戦略に大きな影響を与える可能性があります。

編集コメント

エージェンティック AI の実用化において、コストとパフォーマンスのバランスが問われる中、小型モデルの進化は非常に重要な転換点です。特に SmolLM3-3B のようなモデルが「思考モード」や「長文脈」を備えている点は、開発者がローカル環境で高度なエージェントを構築する際の強力な選択肢となるでしょう。

image**

# イントロダクション

エージェント型 AI システムは、モデルがツールを確実に呼び出し、適切な関数を選択し、引数を正しくフォーマットし、結果を多段階のワークフローに統合する能力に依存しています。ChatGPT、Claude、Gemini などの大規模な最先端モデルはこの点で優れたパフォーマンスを発揮しますが、コスト、レイテンシ、ハードウェア要件におけるトレードオフがあり、多くの実世界での展開には現実的ではありません。一方、小型言語モデルはこれらのギャップを埋めることに成功しており、現在ではいくつかのコンパクトでオープンウェイトの選択肢が、データセンターでの実行を必要とせずに、ファーストクラスのツール呼び出しサポートを提供しています。

そして、特に順序はありませんが、エージェント型ツール呼び出しのための小型言語モデル 5 つをご紹介します。便宜性と一貫性のため、すべてのモデルリンクは Hugging Face でホストされているモデルを指します。

# 1. SmolLM3-3B

リリース日：2025 年 7 月 8 日
開発元：Hugging Face
ロケーション：HuggingFaceTB/SmolLM3-3B

技術的側面

詳細

パラメータ数

アーキテクチャ

デコーダー専用トランスフォーマー（GQA + NoPE、3:1 レシオ）

コンテキスト長

ネイティブで 64K；YaRN 外挿を使用すると最大 128K

トレーニングトークン数

11.2T

多言語サポート

6 カ国語（EN, FR, ES, DE, IT, PT）

推論モード

デュアルモード（思考/非思考トグル）

ツール呼び出し

あり：JSON/XML (xml_tools) および Python (python_tools)

ライセンス

Apache 2.0

SmolLM3**は、小型モデルの限界を押し広げるために設計された 30 億パラメータの言語モデルであり、デュアルモード推論、6 か国語対応、および長いコンテキスト長をサポートしています。これは、グループ化クエリアテンション（GQA）と位置埋め込みなし（NoPE）（比率 3:1）を採用したデコーダー専用トランスフォーマーで、ウェブ、コード、数学、推論データの段階的なカリキュラムに基づき 1.12 トリリンのトークンで事前学習されています。ポストトレーニングには、1400 億の推論トークンによる中間トレーニングフェーズが含まれ、その後、HuggingFace のオフポリシー好意アライメントへのアプローチであるアンカー付き選好最適化（APO）を用いた教師あり微調整とアライメントが行われました。このモデルは、xml_tools を介した JSON/XML ブロッブと、python_tools を介した Python スタイルの関数呼び出しという 2 つの異なるツール呼び出しインターフェースをサポートしており、エージェントパイプラインや RAG システムにおいて非常に柔軟です。重み、データセット、トレーニングコードを含む完全なオープンリリースとして、SmolLM3 はエッジデバイスや低 VRAM マシンなどの制約のあるハードウェア上のチャットボット、RAG システム、およびコードアシスタントに理想的です。

# 2. Qwen3-4B-Instruct-2507

リリース日：2025 年 8 月 6 日
開発元：アリババ（Qwen チーム）
ロケーション：Qwen/Qwen3-4B-Instruct-2507

技術的側面

詳細

パラメータ数

40 億（埋め込みを除く 36 億）

アーキテクチャ

因果 LM、36 レイヤー、GQA（クエリヘッド 32 / KV ヘッド 8）

コンテキスト長

262,144 トークン（ネイティブ）

推論モード

非思考専用（<thinking> ブロックなし）

多言語対応

100 以上の言語

ツール呼び出し機能

あり：ネイティブ対応、Qwen-Agent / MCP を経由

ライセンス

Apache 2.0

Qwen3-4B-Instruct-2507 は、思考モードをオフにした Qwen3-4B の更新版であり、指示の遵守、論理的推論、テキスト理解、数学、科学、コーディング、およびツール使用における一般能力に大幅な改善が加えられています。また、多言語にわたるロングテール知識のカバレッジにおいても大きな進歩を遂げています。Instruct 版と Thinking 版の両方とも、36 層のトランスフォーマーレイヤー（埋め込みベクトルを除く 3.6B）に構築された合計 40 億のパラメータを持ち、GQA（Grouped Query Attention）を採用し、クエリヘッドが 32、キー/バリューヘッドが 8 つとなっています。これにより、非常に長いコンテキストに対する効率的なメモリ管理が可能になります。この特定の非思考モード版は、明示的な思考連鎖のトレースなしに簡潔な回答を提供するなど、直接かつ高速な応答を必要とするユースケース向けに最適化されており、低レイテンシが重要なチャットボット、カスタマーサポート、およびツール呼び出しエージェントに適しています。Qwen3 はツール呼び出し能力において卓越しており、Alibaba は Qwen-Agent フレームワークの使用を推奨しています。このフレームワークは内部でツール呼び出しテンプレートとパーサーをカプセル化し、コーディングの複雑さを軽減するとともに、MCP サーバー設定ファイルへのサポートも提供しています。

# 3. Phi-3-mini-4k-instruct

リリース日：2024 年 4 月
開発元：Microsoft
ロケーション：microsoft/Phi-3-mini-4k-instruct

技術的側面

詳細

パラメータ数

38 億（3.8B）

アーキテクチャ

Decoder-only Transformer（デコーダー専用トランスフォーマー）

コンテキスト長

4,000 トークン

語彙サイズ

32,064 トークン

トレーニングデータ

合成データ＋フィルタリングされた公開ウェブデータ

ポストトレーニング

SFT（Supervised Fine-Tuning：教師あり微調整）＋DPO（Direct Preference Optimization：直接選好最適化）

ツール呼び出し機能

あり：チャットテンプレートを経由（HF の transformers ≥ 4.41.2 が必要）

ライセンス

MIT

Phi-3-Mini-4K-Instruct は、合成データとフィルタリングされた公開ウェブデータを組み合わせた Phi-3 データセットを用いて訓練された、38 億パラメータの軽量かつ最先端のオープンモデルです。高品質性と推論に特化した性質を重視しています。このモデルは、指示の遵守と安全性向上のために、教師あり微調整（SFT）と直接選好最適化 (DPO) の両方を組み込んだポストトレーニングプロセスを経ています。マイクロソフトの旗艦である「小さくても賢い」モデルである Phi-3-mini は、発売時にはスマートフォンを含むオンデバイスでの実行が可能でありながら、GPT-3.5 と同等の能力ベンチマークを達成した点で注目されました。このモデルは主にメモリと計算リソースが制約された環境、レイテンシに敏感なシナリオ、特に数学や論理を要する強力な推論が必要なタスク向けに設計されています。リスト内の他のモデルに比べて古く、4K のコンテキストウィンドウに限られていますが、MIT ライセンスにより利用可能なライセンス条件が最も寛容な選択肢の一つとなっており、その優れた一般推論能力は商業アプリケーションにおける微調整のベースとして人気を集めています。

# 4. Gemma-4-E2B-it

リリース日：2026 年 4 月 2 日
開発元：Google DeepMind
ロケーション：google/gemma-4-E2B-it

技術的側面

詳細

有効パラメータ数

23 億（埋め込みを含むと合計 51 億）

アーキテクチャ

密結合、ハイブリッドアテンション（スライディングウィンドウ＋グローバル）+ PLE

レイヤー数

スライディングウィンドウ

512 トークン

コンテキスト長

128K トークン

語彙サイズ

262K

モダリティ

テキスト、画像、音声（30 秒以下）、動画（フレームとして）

多言語対応

母国語 35 以上、140 以上の言語でトレーニング済み

ツール呼び出し

あり：ネイティブ関数呼び出し対応

ライセンス

Apache 2.0

Gemma-4-E2B は、Google DeepMind の Gemma 4 ファミリーの一部であり、ハイブリッドアテンション機構（hybrid attention mechanism）と、完全なグローバルアテンションを備えたローカルスライディングウィンドウアテンション（local sliding window attention with full global attention）を特徴としています。この設計により、軽量モデル特有の処理速度と低メモリフットプリントを実現しつつ、複雑で長いコンテキストを持つタスクに必要な深い認識能力も損なうことなく提供します。

E2B の「E」は「有効（effective）」パラメータを意味し、これは Per-Layer Embeddings（PLE：層別埋め込み）と呼ばれるアーキテクチャ上の重要な革新によって実現されています。この PLE は、各デコーダー層に専用のコンディショニングベクトルを追加する仕組みです。これが E2B モデルが量子化（quantization）を適用しても 1.5 GB 未満のメモリで動作し、かつ価値ある出力を生み出すことを可能にするメカニズムとなっています。

このモデルはネイティブ関数呼び出しをサポートしており、エージェントワークフローの実現を可能にします。また、モバイルや IoT デバイス上でのオンデバイス展開（on-device deployment）に最適化されており、テキスト、画像、音声、動画の入力を処理する能力を備えています。Apache 2.0 ライセンスの下でリリースされています（以前の Gemma シリーズが採用していたより制限の厳しい独自ライセンスからの変更）。エッジ上で完全に動作するマルチモーダルエージェントアプリケーションを開発する開発者にとって、Gemma 4 E2B は魅力的な選択肢です。

# 5. Mistral-7B-Instruct-v0.3

リリース日：2024 年 5 月 27 日
開発元：Mistral AI
ロケーション：Mistral-7B-Instruct-v0.3

技術的側面

詳細

パラメータ数

72.5 億（7.25B）

アーキテクチャ

Transformer、GQA + SWA

コンテキスト長

32,768 トークン

語彙サイズ

32,768 トークン（v0.2 から拡張）

トークナイザー

v3 Mistral トークナイザー

関数呼び出し

あり：TOOL_CALLS / AVAILABLE_TOOLS / TOOL_RESULTS トークンを介して (こちらを参照)

ライセンス

Apache 2.0

Mistral-7B-Instruct-v0.3 は、Mistral-7B-v0.3 の指示微調整版であり、v0.2 に対して 3 つの重要な変更が導入されています。すなわち、語彙サイズを 32,768 トークンに拡張したこと、v3 トークナイザーへの対応、および関数呼び出し機能への対応です。本モデルは、推論速度向上のためにグループ化クエリアテンション（Grouped-Query Attention: GQA）を採用し、スライディングウィンドウアテンション（Sliding Window Attention: SWA）を効率的に使用して長文シーケンスを処理します。また、TOOL_CALLS、AVAILABLE_TOOLS、TOOL_RESULTS 用の専用トークンを含む語彙の拡張により、関数呼び出し機能が可能になっています。このまとめ記事で取り上げられているモデルの中で最もパラメータ数が大きい 70 億（7B）のパラメータを有する Mistral-7B-Instruct-v0.3 は、グループ全体において最高の汎用指示従順性能を提供し、業界標準のワークホースとして確立されています。Ollama、vLLM、およびほとんどの推論プラットフォームで広く利用可能です。

# まとめ

ここで取り上げた 5 つのモデル、すなわち SmolLM3-3B、Qwen3-4B-Instruct-2507、Phi-3-mini-4k-instruct、Gemma-4-E2B-it、Mistral-7B-Instruct-v0.3 は、アーキテクチャやパラメータ数、コンテキストウィンドウ、リリース日など多岐にわたる特徴を有していますが、一つの重要な共通点を持っています。それはすべてが、コンパクトでオープンウェイトのパッケージにおいて構造化されたツール呼び出しをサポートしているという点です。

Hugging Face の完全な透明性を誇る SmolLM3 から Google DeepMind のマルチモーダルかつエッジ最適化された Gemma 4 E2B に至るまで、この選定リストは、有能なエージェントモデルをデプロイするために大規模なインフラや最先端のモデルがもはや不要であることを示しています。優先事項がオンデバイス推論、長文コンテキストの処理、多言語対応、あるいは可能な限り寛容なライセンスにあるかにかかわらず、このリストには探索する価値のあるモデルが存在します。

これらのモデルだけがツール呼び出し機能を備えた小型言語モデルであるわけではありません。しかしながら、これらは私が直接経験を持ち、私の結果に基づいて自信を持って紹介できる、そのようなモデルを代表するものとしてよく機能しています。

Matthew Mayo** (@mattmayo13) は、コンピュータサイエンスの修士号とデータマイニングの大学院ディプロマを保有しています。KDnuggets と Statology の編集長、および Machine Learning Mastery の寄稿編集者として、Matthew は複雑なデータサイエンスの概念を誰もが理解できるようにすることを目的としています。彼の専門的な関心には、自然言語処理（Natural Language Processing）、言語モデル、機械学習アルゴリズム、そして新興 AI の探求が含まれます。彼はデータサイエンスコミュニティにおける知識の民主化という使命に駆り立てられています。Matthew は 6 歳の頃からプログラミングを続けています。

原文を表示

5 Small Language Models for Agentic Tool Calling

# Introduction

Agentic AI systems depend on a model's ability to reliably call tools, selecting the right function, formatting arguments correctly, and integrating results into multi-step workflows. Large frontier models such as ChatGPT, Claude, and Gemini handle this well, but they come with tradeoffs in cost, latency, and hardware requirements that make them impractical for many real-world deployments. Small language models have done well to close that gap, and several compact, open-weight options now offer first-class tool-calling support without the need for a data center to run them.

And now, in no particular order, here are 5 small language models for agentic tool calling. Note that, for convenience and consistency, all model links point to Hugging Face-hosted models.

# 1. SmolLM3-3B

Release Date: July 8, 2025
Developer: Hugging Face
Location: HuggingFaceTB/SmolLM3-3B

Technical Aspect

Details

Parameters

Architecture

Decoder-only transformer (GQA + NoPE, 3:1 ratio)

Context Length

64K native; up to 128K with YaRN extrapolation

Training Tokens

11.2T

Multilingual Support

6 languages (EN, FR, ES, DE, IT, PT)

Reasoning Mode

Dual-mode (thinking / no-think toggle)

Tool Calling

Yes: JSON/XML (xml_tools) and Python (python_tools)

License

Apache 2.0

SmolLM3** is a 3B parameter language model designed to push the boundaries of small models, supporting dual-mode reasoning, 6 languages, and long context. It is a decoder-only transformer using Grouped Query Attention (GQA) and No Positional Embeddings (NoPE) (with a 3:1 ratio), pretrained on 11.2T tokens with a staged curriculum of web, code, math, and reasoning data. Post-training included a mid-training phase on 140 billion reasoning tokens, followed by supervised fine-tuning and alignment via Anchored Preference Optimization (APO), HuggingFace's off-policy approach to preference alignment. The model supports two distinct tool-calling interfaces, JSON/XML blobs via xml_tools and Python-style function calls via python_tools, making it highly flexible for agentic pipelines and RAG systems. As a fully open release, including weights, datasets, and training code, SmolLM3 is ideal for chatbots, RAG systems, and code assistants on constrained hardware such as edge devices or low-VRAM machines.

# 2. Qwen3-4B-Instruct-2507

Release Date: August 6, 2025
Developer: Alibaba (Qwen Team)
Location: Qwen/Qwen3-4B-Instruct-2507

Technical Aspect

Details

Parameters

4.0B (3.6B non-embedding)

Architecture

Causal LM, 36 layers, GQA (32 Q heads / 8 KV heads)

Context Length

262,144 tokens (native)

Reasoning Mode

Non-thinking only (no <think> blocks)

Multilingual

100+ languages

Tool Calling

Yes: native, via Qwen-Agent / MCP

License

Apache 2.0

Qwen3-4B-Instruct-2507 is an updated version of the Qwen3-4B non-thinking mode, featuring significant improvements in general capabilities including: instruction following, logical reasoning, text comprehension, mathematics, science, coding, and tool usage. It also possesses substantial gains in long-tail knowledge coverage across multiple languages. Both the Instruct and Thinking variants share 4 billion total parameters (3.6B excluding embeddings) built across 36 transformer layers, using GQA with 32 query heads and 8 key/value heads, enabling efficient memory management for very long contexts. This specific non-thinking variant is optimized for direct, fast-response use cases, such as delivering concise answers without explicit chain-of-thought traces, making it well-suited for chatbots, customer support, and tool-calling agents where low latency matters. Qwen3 excels in tool-calling capabilities, and Alibaba recommends using the Qwen-Agent framework, which encapsulates tool-calling templates and parsers internally, reducing coding complexity, with support for MCP server configuration files.

# 3. Phi-3-mini-4k-instruct

Release Date: April 2024
Developer: Microsoft
Location: microsoft/Phi-3-mini-4k-instruct

Technical Aspect

Details

Parameters

3.8B

Architecture

Decoder-only transformer

Context Length

4K tokens

Vocabulary Size

32,064 tokens

Training Data

Synthetic + filtered public web data

Post-training

SFT + DPO

Tool Calling

Yes: via chat template (requiring HF's transformers ≥ 4.41.2)

License

MIT

Phi-3-Mini-4K-Instruct is a 3.8B parameter, lightweight, state-of-the-art open model trained with the Phi-3 datasets that include both synthetic data and filtered publicly available web data, with a focus on high-quality and reasoning-dense properties. The model underwent a post-training process incorporating both Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO) for instruction following and safety. Microsoft's flagship "small but smart" model, Phi-3-mini was notable at launch for its ability to run on-device, including smartphones, while rivaling GPT-3.5 in capability benchmarks. The model is primarily intended for memory- and compute-constrained environments, latency-bound scenarios, and tasks requiring strong reasoning, especially math and logic. While older than the other models in this list and limited to a 4K context window, the MIT license makes it one of the most permissively licensed options available, and its strong general reasoning has made it a popular base for fine-tuning in commercial applications.

# 4. Gemma-4-E2B-it

Release Date: April 2, 2026
Developer: Google DeepMind
Location: google/gemma-4-E2B-it

Technical Aspect

Details

Effective Parameters

2.3B (5.1B total with embeddings)

Architecture

Dense, hybrid attention (sliding window + global) + PLE

Layers

Sliding Window

512 tokens

Context Length

128K tokens

Vocabulary Size

262K

Modalities

Text, Image, Audio (≤30 sec), Video (as frames)

Multilingual

35+ native, trained on 140+ languages

Tool Calling

Yes: native function calling

License

Apache 2.0

Gemma-4-E2B is part of Google DeepMind's Gemma 4 family, which features a hybrid attention mechanism, local sliding window attention with full global attention. This design delivers the processing speed and low memory footprint of a lightweight model without sacrificing the deep awareness required for complex, long-context tasks. The "E" in E2B stands for "effective" parameters, enabled by a key architectural innovation called Per-Layer Embeddings (PLE), which adds a dedicated conditioning vector at every decoder layer. This is the mechanism which allows the E2B to run in under 1.5 GB of memory with quantization and still produce valuable outputs. The model supports native function calling, enabling agentic workflows, and is optimized for on-device deployment on mobile and IoT devices, capable of handling text, image, audio, and video inputs. Released under Apache 2.0 (a change from earlier Gemma generations' more restrictive custom license), Gemma 4 E2B is an attractive option for developers building multimodal agentic applications running entirely at the edge.

# 5. Mistral-7B-Instruct-v0.3

Release Date: May 27, 2024
Developer: Mistral AI
Location: Mistral-7B-Instruct-v0.3

Technical Aspect

Details

Parameters

7.25B

Architecture

Transformer, GQA + SWA

Context Length

32,768 tokens

Vocabulary Size

32,768 tokens (extended from v0.2)

Tokenizer

v3 Mistral tokenizer

Function Calling

Yes: via TOOL_CALLS / AVAILABLE_TOOLS / TOOL_RESULTS tokens (see here)

License

Apache 2.0

Mistral-7B-Instruct-v0.3 is an instruct fine-tuned version of Mistral-7B-v0.3, which introduced three key changes over v0.2: an extended vocabulary to 32,768 tokens, support for the v3 tokenizer, and support for function calling. The model employs grouped-query attention for faster inference and Sliding Window Attention (SWA) to handle long sequences efficiently, and function calling support is made possible through the extended vocabulary including dedicated tokens for TOOL_CALLS, AVAILABLE_TOOLS, and TOOL_RESULTS. As the largest model in this roundup at 7B parameters, Mistral-7B-Instruct-v0.3 offers the best general instruction-following performance of the group and has become an industry-standard workhorse, widely available through Ollama, vLLM, and most inference platforms.

# Wrapping Up

The five models covered here — SmolLM3-3B, Qwen3-4B-Instruct-2507, Phi-3-mini-4k-instruct, Gemma-4-E2B-it, and Mistral-7B-Instruct-v0.3 — span a range of architectures, parameter counts, context windows, and release dates, but share one important trait: they all support structured tool calling in a compact, open-weight package.

From Hugging Face's fully transparent SmolLM3 to Google DeepMind's multimodal edge-optimized Gemma 4 E2B, the selection demonstrates that capable agentic models no longer require massive infrastructure and frontier models to deploy. Whether your priority is on-device inference, long-context handling, multilingual coverage, or the most permissive license possible, there is a model in this list worth exploring.

Keep in mind that these aren't the only small language models with tool-calling capabilities. They do, however, do a good job representing those with which I have direct experience, and which I feel comfortable including based on my results.

Matthew Mayo** (@mattmayo13) holds a master's degree in computer science and a graduate diploma in data mining. As managing editor of KDnuggets & Statology, and contributing editor at Machine Learning Mastery, Matthew aims to make complex data science concepts accessible. His professional interests include natural language processing, language models, machine learning algorithms, and exploring emerging AI. He is driven by a mission to democratize knowledge in the data science community. Matthew has been coding since he was 6 years old.

この記事をシェア

MarkTechPost重要度42026年7月5日 11:31

Qwen の元リーダーが「ハイブリッド思考」の誤りと、なぜ今「エージェント」を支持するのか

MarkTechPost重要度42026年7月5日 01:04

NVIDIA HORIZON：Git ワークツリーを自律的に進化させるハンズフリーエージェントが RTL ベンチマークで完全達成

TLDR AI重要度42026年7月3日 09:00

Devin Security Swarm の紹介（3 分読み）

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む

KDnuggets·2026年5月14日 21:00·約12分

エージェント型ツール呼び出しのための小型言語モデル 5 つ

#Small Language Models #Agentic AI #Tool Calling #Open Source LLMs #Hugging Face

TL;DR

AI深層分析2026年7月5日 03:07

重要/ 5段階

深度40%

キーポイント

大型モデルの代替としての SLM の台頭

SmolLM3-3B の詳細な技術仕様

エージェンティック AI の実用化への貢献

オープンウェイトとライセンスの利点

Qwen3-4B-Instruct の長文コンテキストと多言語対応

このモデルはネイティブで26万トークンのコンテキスト長をサポートしており、100以上の言語に対応しています。

効率的なアーキテクチャとライセンス

GQA（Grouped Query Attention）を採用し、Apache 2.0ライセンスで商用利用可能なオープンソースモデルです。

Phi-3-mini の基本仕様とトレーニング手法

影響分析・編集コメントを表示

影響分析

編集コメント

image**

# イントロダクション

# 1. SmolLM3-3B

リリース日：2025 年 7 月 8 日
開発元：Hugging Face
ロケーション：HuggingFaceTB/SmolLM3-3B

技術的側面

詳細

パラメータ数

アーキテクチャ

デコーダー専用トランスフォーマー（GQA + NoPE、3:1 レシオ）

コンテキスト長

ネイティブで 64K；YaRN 外挿を使用すると最大 128K

トレーニングトークン数

11.2T

多言語サポート

6 カ国語（EN, FR, ES, DE, IT, PT）

推論モード

デュアルモード（思考/非思考トグル）

ツール呼び出し

あり：JSON/XML (xml_tools) および Python (python_tools)

ライセンス

Apache 2.0

# 2. Qwen3-4B-Instruct-2507

リリース日：2025 年 8 月 6 日
開発元：アリババ（Qwen チーム）
ロケーション：Qwen/Qwen3-4B-Instruct-2507

技術的側面

詳細

パラメータ数

40 億（埋め込みを除く 36 億）

アーキテクチャ

因果 LM、36 レイヤー、GQA（クエリヘッド 32 / KV ヘッド 8）

コンテキスト長

262,144 トークン（ネイティブ）

推論モード

非思考専用（<thinking> ブロックなし）

多言語対応

100 以上の言語

ツール呼び出し機能

あり：ネイティブ対応、Qwen-Agent / MCP を経由

ライセンス

Apache 2.0

# 3. Phi-3-mini-4k-instruct

リリース日：2024 年 4 月
開発元：Microsoft
ロケーション：microsoft/Phi-3-mini-4k-instruct

技術的側面

詳細

パラメータ数

38 億（3.8B）

アーキテクチャ

Decoder-only Transformer（デコーダー専用トランスフォーマー）

コンテキスト長

4,000 トークン

語彙サイズ

32,064 トークン

トレーニングデータ

合成データ＋フィルタリングされた公開ウェブデータ

ポストトレーニング

SFT（Supervised Fine-Tuning：教師あり微調整）＋DPO（Direct Preference Optimization：直接選好最適化）

ツール呼び出し機能

あり：チャットテンプレートを経由（HF の transformers ≥ 4.41.2 が必要）

ライセンス

MIT

# 4. Gemma-4-E2B-it

リリース日：2026 年 4 月 2 日
開発元：Google DeepMind
ロケーション：google/gemma-4-E2B-it

技術的側面

詳細

有効パラメータ数

23 億（埋め込みを含むと合計 51 億）

アーキテクチャ

密結合、ハイブリッドアテンション（スライディングウィンドウ＋グローバル）+ PLE

レイヤー数

スライディングウィンドウ

512 トークン

コンテキスト長

128K トークン

語彙サイズ

262K

モダリティ

テキスト、画像、音声（30 秒以下）、動画（フレームとして）

多言語対応

母国語 35 以上、140 以上の言語でトレーニング済み

ツール呼び出し

あり：ネイティブ関数呼び出し対応

ライセンス

Apache 2.0

# 5. Mistral-7B-Instruct-v0.3

リリース日：2024 年 5 月 27 日
開発元：Mistral AI
ロケーション：Mistral-7B-Instruct-v0.3

技術的側面

詳細

パラメータ数

72.5 億（7.25B）

アーキテクチャ

Transformer、GQA + SWA

コンテキスト長

32,768 トークン

語彙サイズ

32,768 トークン（v0.2 から拡張）

トークナイザー

v3 Mistral トークナイザー

関数呼び出し

あり：TOOL_CALLS / AVAILABLE_TOOLS / TOOL_RESULTS トークンを介して (こちらを参照)

ライセンス

Apache 2.0

# まとめ

原文を表示

# Introduction

And now, in no particular order, here are 5 small language models for agentic tool calling. Note that, for convenience and consistency, all model links point to Hugging Face-hosted models.

# 1. SmolLM3-3B

Release Date: July 8, 2025
Developer: Hugging Face
Location: HuggingFaceTB/SmolLM3-3B

Technical Aspect

Details

Parameters

Architecture

Decoder-only transformer (GQA + NoPE, 3:1 ratio)

Context Length

64K native; up to 128K with YaRN extrapolation

Training Tokens

11.2T

Multilingual Support

6 languages (EN, FR, ES, DE, IT, PT)

Reasoning Mode

Dual-mode (thinking / no-think toggle)

Tool Calling

Yes: JSON/XML (xml_tools) and Python (python_tools)

License

Apache 2.0

# 2. Qwen3-4B-Instruct-2507

Release Date: August 6, 2025
Developer: Alibaba (Qwen Team)
Location: Qwen/Qwen3-4B-Instruct-2507

Technical Aspect

Details

Parameters

4.0B (3.6B non-embedding)

Architecture

Causal LM, 36 layers, GQA (32 Q heads / 8 KV heads)

Context Length

262,144 tokens (native)

Reasoning Mode

Non-thinking only (no <think> blocks)

Multilingual

100+ languages

Tool Calling

Yes: native, via Qwen-Agent / MCP

License

Apache 2.0

# 3. Phi-3-mini-4k-instruct

Release Date: April 2024
Developer: Microsoft
Location: microsoft/Phi-3-mini-4k-instruct

Technical Aspect

Details

Parameters

3.8B

Architecture

Decoder-only transformer

Context Length

4K tokens

Vocabulary Size

32,064 tokens

Training Data

Synthetic + filtered public web data

Post-training

SFT + DPO

Tool Calling

Yes: via chat template (requiring HF's transformers ≥ 4.41.2)

License

MIT

# 4. Gemma-4-E2B-it

Release Date: April 2, 2026
Developer: Google DeepMind
Location: google/gemma-4-E2B-it

Technical Aspect

Details

Effective Parameters

2.3B (5.1B total with embeddings)

Architecture

Dense, hybrid attention (sliding window + global) + PLE

Layers

Sliding Window

512 tokens

Context Length

128K tokens

Vocabulary Size

262K

Modalities

Text, Image, Audio (≤30 sec), Video (as frames)

Multilingual

35+ native, trained on 140+ languages

Tool Calling

Yes: native function calling

License

Apache 2.0

# 5. Mistral-7B-Instruct-v0.3

Release Date: May 27, 2024
Developer: Mistral AI
Location: Mistral-7B-Instruct-v0.3

Technical Aspect

Details

Parameters

7.25B

Architecture

Transformer, GQA + SWA

Context Length

32,768 tokens

Vocabulary Size

32,768 tokens (extended from v0.2)

Tokenizer

v3 Mistral tokenizer

Function Calling

Yes: via TOOL_CALLS / AVAILABLE_TOOLS / TOOL_RESULTS tokens (see here)

License

Apache 2.0

# Wrapping Up

この記事をシェア

MarkTechPost重要度42026年7月5日 11:31

Qwen の元リーダーが「ハイブリッド思考」の誤りと、なぜ今「エージェント」を支持するのか

MarkTechPost重要度42026年7月5日 01:04

NVIDIA HORIZON：Git ワークツリーを自律的に進化させるハンズフリーエージェントが RTL ベンチマークで完全達成

TLDR AI重要度42026年7月3日 09:00

Devin Security Swarm の紹介（3 分読み）

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む

エージェント型ツール呼び出しのための小型言語モデル 5 つ

キーポイント

影響分析

編集コメント

# イントロダクション

# 1. SmolLM3-3B

# 2. Qwen3-4B-Instruct-2507

# 3. Phi-3-mini-4k-instruct

# 4. Gemma-4-E2B-it

# 5. Mistral-7B-Instruct-v0.3

# まとめ

# Introduction

# 1. SmolLM3-3B

# 2. Qwen3-4B-Instruct-2507

# 3. Phi-3-mini-4k-instruct

# 4. Gemma-4-E2B-it

# 5. Mistral-7B-Instruct-v0.3

# Wrapping Up

関連記事

エージェント型ツール呼び出しのための小型言語モデル 5 つ

キーポイント

影響分析

編集コメント

# イントロダクション

# 1. SmolLM3-3B

# 2. Qwen3-4B-Instruct-2507

# 3. Phi-3-mini-4k-instruct

# 4. Gemma-4-E2B-it

# 5. Mistral-7B-Instruct-v0.3

# まとめ

# Introduction

# 1. SmolLM3-3B

# 2. Qwen3-4B-Instruct-2507

# 3. Phi-3-mini-4k-instruct

# 4. Gemma-4-E2B-it

# 5. Mistral-7B-Instruct-v0.3

# Wrapping Up

関連記事