中国AIモデルMiniMax M2.7、自らの開発に貢献したと報告
中国のAI企業MiniMaxが発表したモデル「M2.7」は、自律的な最適化ループを通じて自らの学習プロセスを改善する役割を果たし、競争力のあるベンチマーク結果を達成したと報じられている。
キーポイント
自律的な開発への関与
MiniMax M2.7モデルは、自らの開発プロセスに積極的に関与し、自律的な最適化ループを通じて学習を改善したと報告されている。
技術的な革新性
AIモデルが自らの訓練プロセスを最適化するというアプローチは、AI開発のパラダイムに新たな可能性を示す技術的に革新的な進展である。
競争力のある性能
この自律最適化アプローチにより、M2.7は競争力のあるベンチマーク結果を達成したとされている。
中国AI企業の動向
この発表は、中国のAI企業が最先端のAI開発手法に取り組んでいることを示す事例の一つである。
影響分析・編集コメントを表示
影響分析
この記事は、AIモデルが自らの開発プロセスを改善するという自律最適化の概念を示しており、AI開発の効率化と高度化に向けた重要な一歩となる可能性がある。特に中国企業による技術革新の動向としても注目される。
編集コメント
AIが自らの開発に貢献するという概念は、技術的に非常に興味深く、今後のAI開発の方向性に影響を与える可能性がある注目すべき発表です。
中国のAI企業MiniMaxは、自らの開発に積極的に関与したとされるモデル「M2.7」を発表しました。自律最適化ループを通じて自らの学習プロセスを改善し、競争力のあるベンチマーク結果を達成したと報告されています。
本記事「中国のAIモデルMiniMax M2.7、自らの開発に貢献したと報告」は、The Decoderに最初に掲載されました。
原文を表示
Chinese AI company MiniMax has released M2.7, a model that reportedly played an active role in its own development. Through autonomous optimization loops, it improved its own training process and posted competitive benchmark results.
During development, M2.7 reportedly updated its own knowledge stores, built dozens of complex capabilities within its agent infrastructure, and improved its reward-based training on its own. It then took those results and used them to refine its own learning process.
MiniMax describes M2.7 as "our first model deeply participating in its own evolution" and lays out a vision where future AI self-evolution will "gradually transition towards full autonomy, coordinating data construction, model training, inference architecture, evaluation, and other stages without human involvement."
MiniMax M2.7 compared with Sonnet 4.6, Opus 4.6, Gemini 3.1 Pro, and GPT-5.4 across eight benchmarks. M2.7 scores close to the leading proprietary models in most tests. | Image: MiniMax
MiniMax isn't the only company exploring this approach. OpenAI recently introduced its GPT-5.3 Codex coding model with similar claims about AI-assisted development. According to OpenAI, the Codex team used early versions of the model to find bugs during training, manage deployment, and evaluate test results. The team said they were surprised by how much Codex accelerated its own development process.
Over 100 autonomous optimization rounds show what self-improving AI can do
To push the limits of this self-optimization, MiniMax had an internal version of M2.7 set up a research agent system that works with various project groups inside the company. According to MiniMax, the agent handles tasks like literature research, experiment tracking, debugging, metric analysis, and code fixes as part of the in-house RL team's daily workflow. Human researchers only step in when critical decisions need to be made. The model covers 30 to 50 percent of the entire workflow.
How M2.7 develops itself: researchers set goals and guidelines, then the AI agent takes over large parts of the development process on its own. The example workflow below shows how experiment planning, code changes, and evaluation feed into each other. | Image: MiniMax
In one experiment, M2.7 optimized a model's coding performance in an internal development environment completely on its own over more than 100 rounds. Each round, it analyzed failures, planned changes, tweaked the code, tested the results, and decided whether to keep or toss the changes. According to MiniMax, this led to a 30 percent performance boost on internal evaluation sets.
In 22 machine learning competitions from OpenAI's MLE Bench Lite, M2.7 hit an average medal rate of 66.6 percent across three 24-hour runs. That puts the model behind Opus 4.6 (75.7 percent) and GPT-5.4 (71.2 percent), but right on par with Gemini 3.1, according to the company.
That said, benchmark results serve as useful indicators but don't necessarily reflect real-world performance. How a model scores on standardized tests can differ significantly from how it handles everyday tasks, and results depend heavily on testing conditions, prompt formatting, and model optimization. These numbers are best treated as rough reference points rather than definitive measures of capability.
M2.7 keeps pace with top Western models in coding and office tasks
According to MiniMax, M2.7 delivers results on par with leading Western models in software engineering benchmarks. On SWE-Pro, it scored 56.22 percent, comparable to GPT-5.3-Codex. On VIBE-Pro, a benchmark for complete project delivery, it hit 55.6 percent. In real-world scenarios, M2.7 reportedly cut recovery time for production system failures to under three minutes on multiple occasions.
For professional office work, M2.7 achieved an ELO score of 1,495 on the GDPval-AA benchmark, the highest score among open-weight models, according to MiniMax. The model reportedly handles multi-level edits in Word, Excel, and PowerPoint with high accuracy and maintains 97 percent rule fidelity across more than 40 complex instruction sets.
As a practical example, MiniMax describes a financial analysis for TSMC where M2.7 independently read annual reports, built a sales forecast model, and turned the results into a presentation and research report. Financial experts said the output could already work as a first draft.
Open-source demo brings AI interaction into a graphical environment
Beyond productivity scenarios, MiniMax also improved the model's character consistency and emotional intelligence. To show this off, the company released OpenRoom, an open-source project that moves AI interaction into a graphical web environment where characters proactively engage with their surroundings. M2.7 is available through MiniMax Agent and the API platform; unlike previous model versions, weights aren't available yet.
Jürgen Schmidhuber laid the theoretical groundwork for self-improving AI back in 2003 with the concept of the "Godel Machine," which only modifies its own code when there's formal proof of benefit. Projects like Sakana AI's "Darwin-Gödel Machine" and the "Huxley-Gödel Machine" from Schmidhuber's KAUST lab take a more pragmatic approach, having AI agents iteratively modify their own code and pick the best-performing variants through an evolutionary process.
関連記事
今日のまとめ
AI日報で今日の重要ニュースをまとめ読み