AIが数学研究の本質をどのように変えているか
AIが数学研究の本質を変えつつあり、機械学習理論家がAIエージェントを用いて証明を生成し、通常数ヶ月かかる50ページの数学論文を3週間で作成した事例が示されている。
キーポイント
AIによる数学証明の自動生成
AIツールが高レベルの証明スケッチのプロンプトから厳密な数学的証明を開発・記述できるようになり、数学研究の方法論を変革している。
研究効率の劇的向上
通常数ヶ月かかる50ページの数学論文を、AIエージェントツールを使用して3週間で作成することに成功した。
AIと研究者の協働関係
証明ベースのAIツールとの作業は、賢く広範な教育を受けたが時々間違いを犯す同僚との協力に似ていると表現されている。
グラフ理論と機械学習の融合
作成された論文は、グラフ理論と機械学習の概念に基づく最適化問題を記述・解決している。
AIによる証明の自動化と研究プロセスの加速
AIは高レベルの証明スケッチから厳密な数学的証明を自動的に生成し、研究の詳細記述と形式化を大幅に高速化している。
AIが研究の質と内容に与える影響
最近の研究では、AIが最終結果に不可欠な重要なアイデアを提供し、AIなしでは生み出せなかった質的に異なる優れた論文が作成されている。
AI導入による研究規範と科学訓練への懸念
AIによる研究自動化は、若手研究者の直感や科学的センスの育成方法、査読制度の限界、科学のインセンティブ構造への課題を提起している。
影響分析・編集コメントを表示
影響分析
この記事は、AIが基礎科学研究の方法論そのものを変革しつつあることを示しており、数学研究の生産性革命をもたらす可能性がある。AIによる証明生成は、複雑な数学的問題の解決を加速し、研究者の創造的思考に集中する余地を拡大する重要な進展である。
編集コメント
AIが数学のような高度な知的作業に本格的に参入したことを示す画期的な事例。研究開発のスピードとスケールが根本的に変わる可能性を秘めている。
1960年代前半に高水準プログラミング言語とコンパイラが広く導入された後、ほとんどのソフトウェアエンジニアは、基礎となるハードウェアに直接指示を与えるが煩雑な機械語やアセンブリ言語を書かなくなりました。しかし、優れたプログラマーは依然として、コンパイラが高水準言語を機械語にどのように翻訳するかを理解しており、正確性と性能について推論できました。私たちは、技術的議論の構築と検証を容易にすることで、すべての研究者がより高い抽象度で作業し、より大きな構想を練れるようになることを望みます。私たちが思い描く研究文化は、問題意識、課題の選択、モデリングのスキルを重視し、技術的な技巧そのものは相対的に軽視するものとなるでしょう。
査読(ピアレビュー)に対する真剣でコミュニティ全体での再検討がなければ、AIは個々の研究者レベルでは科学の進歩を加速する一方で、コミュニティ全体としてはその進歩を停滞させる脅威となります。
査読の破壊と再構築
私たちの見方では、査読の主たる目的は、研究の正確性と質を検証するプロセスというよりも、むしろ希少な資源、すなわち研究コミュニティの注目を適切な場所に集中させることにあります。科学は研究者が互いの研究を土台にして進歩しますが、すでに誰もがすべてに追いつけないほどの研究が存在しています。出版プロセスは、最も興味深く有望な方向性を特定し、それらがより効率的かつ徹底的に発展できるようにすべきです。
AIはこのコミュニティの注目の集中にどのように影響するのでしょうか。AIツールは、洗練されて正しく見える研究をはるかに容易に生み出し、ジャーナルや会議に投稿可能な論文を生成する障壁を劇的に下げます。これらの論文の多くは、興味深くもなければ実際には正しくもありませんが、それを見極めるには査読者からの多大な努力が必要です。
これは、会場ごとに数万件の投稿に苦しむ、すでに過負荷な機械学習(マシンラーニング)出版のエコシステムに負担をかけています。「優れた論文」とは限らない「論文」を生み出すのに必要な時間と労力を削減することが、既存の査読制度を不安定にし始めているのを私たちは目撃しています。最新のAIおよびML会議では、投稿数が何倍にも増加し、AIによって磨かれたが最終的には低品質な論文が相当数あり、驚くほど査読プロセスを通過した後に初めて指摘される事態が起きています。
AIの使用は、より速くより良い研究を生み出す可能性を示してきましたが、科学の未来とそれが広い世界と持つ関係を気にかける人々にとって深刻な疑問も生み出しています。これは研究分野全体の問題であり、部分的にはAI生成論文の市場を生み出しているためです。これは今度は、AI生成論文をAIで検出するという対抗市場を生み出しました。これは、スパムとその検出をめぐるおなじみの技術的軍拡競争によく似ていますが、厄介または詐欺的な電子メールのフィルタリングだけでなく、科学出版の誠実さが危機に瀕している点が異なります。
短期的な対策として、主要な会議ですでに導入が進むツールである、AI駆動の自動正確性チェック(例えば、数学的証明の形式的検証(フォーマルベリフィケーション))は価値があるかもしれません。これを、コードに対するものの代わりに、数学に対する単体テスト(ユニットテスト)の一形態と考えてください。目的は、重大な誤りを含む論文をふるい落とし、人間の査読者の仕事を、彼らが最も適切に評価できる科学の本質的部分、すなわち新しい結果から世界について何が学べるか、それがどれほど有用で興味深いかを判断することに集中させることであり、無数の論文の技術的正確性をチェックする単調な作業に溺れさせないことです。
査読に対する真剣でコミュニティ全体での再検討がなければ、AIは個々の研究者レベルでは科学の進歩を加速する一方で、コミュニティ全体としてはその進歩を停滞させる脅威となります。
展望
私たちは、AIが科学研究の方法論、研究者育成、査読に大変革をもたらしていると考えます。来るべき変化から目を背けることはできません。しかし、積極的に適応し、AI支援研究がその可能性を十分に発揮することを確実にする機会はあります。来年の終わりには研究はどのように見えるでしょうか。その次の年はどうでしょうか。私たちは過去1年間で、過去10年間よりも多くの変化を目撃してきました。ですから、私たちが確信を持って予測できるのは「様変わりする」ということだけです。
私たちの科学機関、査読制度、出版、大学院教育は、人間の認知と努力の制約に合わせて数十年かけて進化してきました。それらの制約は急速に変化しており、私たちの機関もそれに合わせて変化する必要があります。私たちの目標は、AIが人間の創造性と洞察力を増幅し、発見を加速し、研究事業に参加できる人々の範囲を広げる一方で、科学を価値あるものにする喜びと厳密さを保持する世界に向けて舵を切ることであるべきです。
研究分野: 機械学習
タグ: エージェンシックAI, 大規模言語モデル(LLMs), 責任あるAI, 形式的検証
原文を表示
How AI is changing the nature of mathematical research
What machine learning theorists learned using AI agents to generate proofs and what comes next.
Machine learning
Michael Kearns Aaron Roth March 09, 01:55 PM March 09, 01:58 PM Modern AI coding tools have revolutionized software engineering, with developers now using AI assistants to write a substantial fraction of their code across a range of applications. As scientists studying the theory of machine learning, were already seeing a similar transformation in basic scientific methodology, especially for research of a mathematical nature.
More precisely, AI tools are now able to develop and write rigorous mathematical proofs only from prompts providing high-level proof sketches. These proofs are written in longstanding languages for detailing mathematical arguments, in the same way that code is written in formal programming languages like Python. AI seems to have become proficient in both kinds of languages and their underlying logics.
Working with proof-based AI tools is akin to collaborating with a smart, broadly educated but occasionally error-prone colleague.
We came to this realization during a three-week period last summer, when we used agentic AI tools to write a mathematical paper that normally would have taken months. The 50-page paper describes and solves an optimization problem based on concepts from graph theory and machine learning. A typical prompt we would give the AI to set up the general framework for our paper looked like this: Imagine a directed acyclic network of linear least-squares learning agents, each of which shares a common dataset but each of which sees only a different subset of the features.
A typical prompt for a theorem statement and proof went We believe that if the network contains a sufficiently long chain of agents whose features cover the entire dataset, some agent in the chain should rapidly converge to the globally optimal linear model. The proof should use the fact that error monotonically decreases in the chain, which forces long sequences of agents to be multi-accurate with respect to each others features. While incantations like these might be opaque to the casual reader, they all have precise, standard mathematical interpretations that the AI was aware of, due to its training, and it proceeded to translate informal intuitions into precise definitions and statements. This translation was imperfect (as discussed below) but resulted in a great first draft that could then be corrected and smoothed.
To be clear, for this specific paper, we already knew the general outline of the proofs we had in mind. What AI did was to automate and dramatically speed up the process of filling in the missing details and writing them with formal precision. But more recently, weve written papers that we believe are substantially different and better than what we would have produced without AI assistance in which the AI contributed key ideas that were crucial to the final results.
Its important to note that AI tools are advancing quickly, which makes the future difficult to predict. While their use has shown potential to produce faster and better research, it has also generated serious questions for those who care about the future of science and its relationship to the broader world. AI is changing research norms and workflows. This raises concerns about how to train future generations of scientists.
Specifically, how can intuition and good taste in scientific research be developed when AI automates many of the steps that have historically been used to train young researchers? Peer review is another challenge: AI-generated research papers, quickly churned out at scale, highlight the limitations of peer review and modern-day publishing structures and also exacerbate already emerging challenges to incentives for scientific success. Without claiming to have answers or solutions to these concerns, we are personally living through them and will discuss each in turn.
AI tools are now able to develop and write rigorous mathematical proofs only from prompts providing high-level proof sketches.New ways of doing research
One of our major takeaways from our summer research project was that working with proof-based AI tools is akin to collaborating with a smart, broadly educated but occasionally error-prone colleague. One can verbally sketch a mathematical argument to an AI agent as you might to a human collaborator, and the agent can turn that sketch into a formally written lemma or theorem along with its proof.
Increasingly, AI agents can find proofs themselves without a sketch, especially when those proofs are "standard" in some areas of mathematics. This is more useful than it sounds: many kinds of arguments are "standard" in some field, but often one in which you, the human author, are not an expert. An advantage of AI tools is that they are conversant in an enormous number of areas of mathematics and other scientific disciplines.
For example, in our case, along the way to proving one of our main results from the sketch we provided incrementally, the AI spontaneously proved a simple but useful lemma we were not aware of, which meaningfully simplified the argument we had in mind. The implications of this sort of creativity are exciting, especially for lowering the barrier to discovery: scientists without access to a diverse community of collaborators could also participate in cutting-edge research in ways that were previously impossible.
Using these tools still requires caution and expertise, however. The proofs they generate are correct perhaps only three-quarters of the time. But when theyre wrong, if you can identify the errors, it is often possible to iterate to correctness and then continue along a promising path.
A 25% error rate is low enough to make the tools extremely useful to experts but high enough to sometimes devolve into "AI research slop" polished-looking but ultimately flawed or uninteresting work when used without care or discernment. The models, after all, still dont know what is interesting or useful.If the errors remain uncorrected, trying to continue often takes you down a dead end. A 25% error rate is low enough to make the tools extremely useful to experts but high enough to sometimes devolve into "AI research slop" polished-looking but ultimately flawed or uninteresting work when used without care or discernment. The models, after all, still dont know what is interesting or useful.
We also noticed some recurring failure modes or rabbit holes that come from using the AI tools. While writing our paper, we asked the AI to generate a small, self-contained result, which it did perfectly in a matter of minutes, at which point we told it this subproject was completed. Nevertheless, during the coming days, the AI would spontaneously take the initiative to suggest returning to the topic, despite being repeatedly told not to do so unless asked. This was an irritating reminder that generative AI does not have perfect recall but only an incomplete summary or embedding of the context. While working on the code for the experiments to illustrate our theoretical findings, we found that the AI could alternate between writing large amounts of rather complex working code very rapidly and getting lost for hours on something trivial, like simply printing out which iteration of a loop was being executed.
Training the next generation
Historically, people earn expertise in the mathematical sciences through struggle as junior researchers. PhD students spend years working through the details of technical arguments to gain hard-won intuitions about when a proof approach is promising, when they are being led astray by a problem, or what constitutes a novel and interesting research direction.
But these aspects of being a researcher are exactly what AI tools are giving away. If doctoral students can simply ask AI for proofs which is extremely tempting, especially when it is in service of advancing research how do they develop the experience and skill that, for now at least, are required to use AI tools productively in the first place?
We may need to be more intentional about teaching these foundational skills to young researchers, perhaps adopting an advanced version of teaching arithmetic in grade school without the use of calculators. The straightforward recommendation is to require junior researchers to write papers the old-fashioned way, even when their work could be sped up by AI.
Perhaps in a separate track, students would be trained to understand and work with emerging AI tools. This is an area of increasing importance that will likely require creative solutions. While we are strong believers that AI tools will do astounding things for science, it may be important to deliberately moderate their use in order to build researchers up to the point at which they can use them wisely and tastefully, not simply as short cuts to second-rate (or worse) research.
These next-generation training challenges arent unique to scientists using AI. We see them across myriad fields, including engineering, customer service, law, writing, and design really, any industry in which entry-level tasks, previously used to introduce young workers to a field, are now done using AI. To find creative solutions to this skills-training challenge, or to just better anticipate the changes at hand, it might be helpful to look at analogies across fields or over time.
After high-level programming languages and compilers were widely introduced in the early 1960s, most software engineers no longer wrote machine code or assembly language, which provided direct instructions to the underlying hardware but were tedious to program. But the best programmers still understood enough about how compilers translated high-level languages into machine code to reason about correctness and performance. We hope that making it easier to construct and check technical arguments will let all researchers operate at a higher level of abstraction and think bigger thoughts. The culture we envision would emphasize taste, problem selection, and modeling skills and devalue technical wizardry for its own sake.
Without a serious, community-wide re-evaluation of peer review, AI threatens to arrest scientific progress at the community level even as it accelerates it at the level of individual researchers.
Breaking and remaking peer review
From our perspective, peer review is not only, or even primarily, a process to verify the correctness and quality of research. Rather, its purpose is to focus a scarce resource the attention of the research community in the right places. Science progresses as researchers build on each others work, but there is already too much work out there for anyone to keep up with. The publication process should help identify the most interesting and promising directions, so they can be more efficiently and thoroughly developed.
How does AI influence this focusing of communal attention? AI tools make it much easier to produce work that looks polished and correct, dramatically lowering the barrier to generating papers that can be submitted to journals and conferences. Many of these papers are neither interesting nor actually correct but discovering this requires significant effort from reviewers.
This is straining an already overburdened machine learning publishing ecosystem struggling with tens of thousands of submissions per venue. We have seen that reducing the time and effort needed to produce "a paper" not necessarily a good paper is beginning to destabilize our existing institutions for peer review. The most recent iterations of AI and ML conferences have seen the number of submissions growing by large multiples, with a significant number of papers polished by AI, but ultimately of low quality, making it surprisingly far through the review process before being noticed and called out.
While the use of AI has shown potential to produce faster and better research, it has also generated serious questions for those who care about the future of science and its relationship to the broader world.This is a problem across research fields, partially because its creating a market for AI-generated papers. This has in turn engendered a countermarket for AI-assisted detection of AI-generated papers much like the familiar technological arms races around things like spam and its detection, but with the integrity of scientific publication at stake, not just the filtration of annoying or fraudulent e-mails.
As a short-term fix, AI-driven automated correctness checks (e.g., formal verification of mathematical proofs), tools for which are already being deployed in major conferences, could be valuable. Think of this as a form of unit testing for math instead of code. The aim is to filter out papers that have nontrivial errors, while focusing the job of the human reviewer on the important parts of science that they are best suited to evaluate: determining what we learn about the world from a new result, and how useful and interesting it is, rather than being drowned in the monotony of checking countless papers for technical correctness.
Without a serious, community-wide re-evaluation of peer review, AI threatens to arrest scientific progress at the community level even as it accelerates it at the level of individual researchers.
Looking ahead
We think AI is bringing a sea change in scientific research methodology, training, and peer review; there is no hiding from what is coming. But there are opportunities to proactively adapt and ensure that AI-assisted research fulfills its promise. What will research look like at the end of next year? The year after that? Weve seen more change in the past year than in the previous decade, so all we can confidently predict is "different".
Our scientific institutions peer review, publishing, graduate education evolved over decades to match the constraints of human cognition and effort. Those constraints are shifting rapidly, and our institutions will need to shift with them. Our goal should be to steer toward a world where AI amplifies human creativity and insight, accelerates discovery, and expands who can participate in the research enterprise while preserving the joy and rigor that make science worthwhile.
Research areas: Machine learning
Tags: Agentic AI, Large language models (LLMs), Responsible AI , Formal verification
関連記事
今日のまとめ
AI日報で今日の重要ニュースをまとめ読み