The Decoder·2026年4月9日 20:05·約1分で読める

Zhipu AIのGLM-5.1、数百回の反復で自らのコーディング戦略を再考可能

#LLM #コード生成 #自己改善 #オープンソース #Zhipu AI #開発者ツール

TL;DR

Zhipu AIは、MITライセンスで新しいGLM-5.1モデルをリリースし、このモデルはコーディングタスクに取り組む際に数百回の反復を通じて自身のアプローチを洗練できると報告されている。

AI深層分析2026年4月9日 21:41

注目/ 5段階

深度40%

キーポイント

GLM-5.1のリリースとライセンス

Zhipu AIが新しいGLM-5.1モデルをMITライセンスの下で公開した。

自己反復によるコーディング戦略の改善

このモデルは、コーディングタスクにおいて、数百回の反復を通じて自身のアプローチを再考・洗練できる能力を持つとされる。

オープンソースでの提供

MITライセンスでの公開は、広範な利用と開発コミュニティへのアクセスを促進する。

影響分析・編集コメントを表示

影響分析

この発表は、AIモデルが自律的にコード生成戦略を改善する能力を示し、開発者支援ツールの進化に寄与する可能性がある。MITライセンスでの公開は、研究と実用化の加速を促すが、具体的な性能詳細や実証データが限定的なため、現時点での実用影響は慎重に見極める必要がある。

編集コメント

自己反復によるコーディング改善は興味深いが、記事が短く具体性に欠けるため、実際の性能や適用範囲の評価にはさらなる情報が必要。オープンソース化はコミュニティ発展の好材料。

Zhipu AIは、新たなモデル「GLM-5.1」をMITライセンスの下で公開しました。報告によれば、このモデルはコーディングタスクに取り組む際、数百回の反復を経て自身のアプローチを洗練させることが可能です。

この記事「Zhipu AIのGLM-5.1、コーディング戦略を数百回の反復で再考可能に」は、The Decoderで最初に公開されました。

原文を表示

Zhipu AI has released its new GLM-5.1 model under an MIT license. The model can reportedly refine its own approach over hundreds of iterations when tackling coding tasks.

Zhipu AI has introduced GLM-5.1, a new open-weight model designed for long-running, agent-based programming tasks. The core argument: existing models, including Zhipu's own predecessor GLM-5, run out of ideas too quickly on complex problems. They apply familiar strategies, make early progress, and then hit a wall. Throwing more compute at the problem doesn't help.

GLM-5.1 is supposed to fix this by repeatedly reviewing its own strategy, recognizing dead ends, and trying new approaches. Zhipu AI describes optimization across "hundreds of rounds and thousands of tool calls."

The company demonstrates this with three scenarios, though all of them were conducted internally. Independent evaluations don't exist yet.

GLM-5.1 switches strategies on its own mid-task

In the first scenario, GLM-5.1 had to optimize a vector database - a system that searches large datasets and finds similar entries. The goal: answer as many search queries per second as possible without losing accuracy. In a standard test run with 50 rounds, Claude Opus 4.6 held the previous best score of 3,547 queries per second, according to Zhipu AI.

Instead, Zhipu AI gave GLM-5.1 unlimited attempts. The model decided on its own when to submit a new version and what to try next. After more than 600 iterations and over 6,000 tool calls, it reached 21,500 queries per second - roughly six times the previous best, the company says.

According to Zhipu, the model fundamentally changed its strategy multiple times during the run. Around iteration 90, it switched from exhaustively searching all data to a more efficient clustering approach. Around iteration 240, it introduced a two-stage pipeline that does rough pre-sorting before precise filtering. The company identifies six such structural shifts over the entire run, each initiated by the model itself.

GPU optimization shows progress but doesn't reach the top

In the second scenario, the model had to rewrite existing machine learning code to run faster on GPUs. GLM-5.1 achieved a 3.6x speedup over the baseline implementation and continued making progress even in later phases, according to Zhipu AI. GLM-5, by contrast, plateaued much earlier.

On the KernelBench Level 3 GPU optimization task, GLM-5.1 sustains progress far longer than its predecessor GLM-5 but still trails Claude Opus 4.6. | Image: Zhipu AI

Claude Opus 4.6 remains clearly ahead in this test with a 4.2x speedup and still shows room for improvement at the end. GLM-5.1 extends the productive horizon compared to its predecessor but doesn't close the gap to the strongest competitor.

A Linux desktop from a single prompt

The third scenario is the most unusual. GLM-5.1 was asked to build a complete Linux desktop environment as a web application - no starter code, no intermediate instructions. Most models deliver a basic shell with a taskbar and a few placeholder windows, then call the job done, according to Zhipu AI.

GLM-5.1 was placed in a loop where it reviewed its own output after each round and decided what was still missing or needed improvement. After eight hours, the result was a functional desktop environment with a file browser, terminal, text editor, system monitor, calculator, and games, the company says.

Strong at coding, weaker at reasoning

Beyond the three demos, Zhipu AI published a benchmark table that paints a more nuanced picture. In coding, GLM-5.1 leads or matches the competition in several tests. On SWE-Bench Pro, a software engineering benchmark, it scores 58.4 percent - the highest among all tested freely available models, according to Zhipu AI, just ahead of GPT-5.4 at 57.7 percent and Claude Opus 4.6 at 57.3 percent. On CyberGym, a cybersecurity benchmark, it posts the top score of 68.7. Zhipu AI acknowledges, however, that Gemini 3.1 Pro and GPT-5.4 refused to execute some tasks for safety reasons, which likely dragged down their scores.

On Humanity's Last Exam, a knowledge test, the model scores 31 percent - behind Gemini 3.1 Pro at 45 and GPT-5.4 at 39.8. On scientific questions (GPQA-Diamond), it also trails with 86.2 compared to Gemini 3.1 Pro at 94.3 and GPT-5.4 at 92.

Results on agent-based tasks are mixed as well. In Vending Bench 2, where a model has to run a simulated vending machine business, GLM-5.1 ends up with a balance of $5,634. Claude Opus 4.6 reaches $8,018 - significantly more. On repository generation (NL2Repo), Claude Opus 4.6 also leads clearly with 49.8 versus GLM-5.1's 42.7.

On the Artificial Analysis Intelligence Index, the model currently sits just behind Anthropic's Claude 4.6 Sonnet.

Zhipu AI openly names remaining challenges: the model needs to recognize dead ends sooner, maintain coherence across thousands of tool calls, and reliably self-assess on tasks without clear metrics. GLM-5.1 is a "first step" in that direction, the company says.

The model is available under an MIT license on Hugging Face and ModelScope, and can be accessed through the API platforms api.z.ai and BigModel.cn. It integrates with coding agents like Claude Code and OpenClaw. For local deployment, Zhipu AI supports the inference frameworks vLLM and SGLang, with setup guides in the GitHub repository. Access through the Z.ai chat interface is expected to go live in the coming days.

Zhipu AI is rapidly expanding its model lineup

Zhipu AI recently introduced GLM-5V-Turbo, a multimodal coding model that generates code directly from images and video. Before that, the company released GLM-5 in February, an open-weight model with 744 billion parameters designed to compete with leading proprietary models on coding tasks. GLM-5.1 likely builds on both and adds the long-horizon capabilities Zhipu AI hopes will set it apart from Chinese competitors. That competition remains fierce: alongside Zhipu AI, Moonshot AI with Kimi K2.5 and Alibaba with Qwen3.5 are also pushing hard into the autonomous coding agent market.

Zhipu AI isn't the only company betting on long-running AI agents. In early 2026, Cursor had hundreds of GPT-5.2 agents spend a week building a web browser. The resulting three million-plus lines of Rust code turned out to be nearly unmaintainable, landing in the bottom five percent of all evaluated software systems according to an analysis by the Software Improvement Group.

AI News Without the Hype – Curated by Humans

Subscribe to THE DECODER for ad-free reading, a weekly AI newsletter, our exclusive "AI Radar" frontier report six times a year, full archive access, and access to our comment section.

Subscribe now

この記事をシェア

Latent Space★42026年6月5日 15:44

[AINews] 今日は何も大きな出来事はありませんでした

Anthropic が RSI の兆候を示し、OpenAI の ChatGPT が月間アクティブユーザー数で 10 億人を突破。SpaceX AI は IPO について説明しているが、最も重要なのは AIE WF のチケット確保とイベント参加である。

Ars Technica AI★42026年6月4日 04:10

Google の新モデル「Gemma 4 12B」は 16GB RAM のノート PC で動作可能に設計

Google は、メモリ消費を抑えた新しい生成 AI モデル「Gemma 4 12B」を発表した。このモデルは、一般的な消費者向けノートパソコン（RAM 16GB）でも実行できるように最適化されており、ローカルでの AI 利用を促進するものである。

Sebastian Raschka★42026年6月6日 20:16

LLM 研究論文：2026 年 1 月から 5 月のリスト

Sebastian Raschka が、2026 年上半期（1 月〜5 月）に注目すべき大規模言語モデル関連の研究論文を選定し、一覧として公開した。

ニュース一覧に戻る元記事を読む