MIT ML News·2025年12月18日 04:00·約8分

「科学的サンドボックス」が研究者に視覚システムの進化を探求させる

#Vision Systems #Evolutionary AI #Robotics Sensors #MIT Media Lab

TL;DR

MIT の研究者たちは、AI エージェントの視覚進化をシミュレーションする「科学的サンドボックス」を開発し、タスクに応じて複合眼やカメラ型眼がどのように進化したかを解明した。

AI深層分析2026年5月2日 19:02

重要/ 5段階

深度40%

キーポイント

進化のシミュレーション環境の実現

MIT が開発した新フレームワークにより、時間旅行をせずとも人工知能エージェントを用いて視覚システムの進化過程を再現・検証できる「科学的サンドボックス」が完成した。

タスクと眼の形態の相関関係

ナビゲーション任務では昆虫のような複合眼が、物体識別任務では虹彩や網膜を持つカメラ型眼が進化するという、環境・タスクと視覚器官の明確な因果関係が示された。

ロボティクスへの応用可能性

この研究は、エネルギー効率や製造コストといった現実制約を考慮しつつ、ドローンやウェアラブルデバイス向けに最適な新型センサーやカメラの設計指針を提供する。

影響分析・編集コメントを表示

影響分析

この研究は、進化生物学と機械学習の境界領域において画期的な進展をもたらすものであり、単なる観察ではなく「実験的検証」を可能にすることで科学的方法論そのものを拡張します。特に、ロボティクスやセンサー設計におけるバイオミメティクス（生物模倣）アプローチに実用的な指針を与え、エネルギー効率と性能のバランスを最適化する新世代デバイスの開発を加速させる可能性があります。

編集コメント

進化の「なぜ」をAIで解き明かすこのアプローチは、従来のデータ駆動型学習とは異なり、生物学的な制約と目的関数の関係を深く理解する上で極めて重要です。ロボットの視覚設計において、単に精度を上げるだけでなく、エネルギー効率や製造可能性といった実世界要件を進化の原理から導き出せる点は、産業応用において非常に注目すべき点です。

なぜ人類は、今日のような目を進化させたのでしょうか？

科学者たちは過去にタイムトラベルして、自然界に存在する多様な視覚システムを形作った環境圧力を研究することはできませんが、MIT の研究者が開発した新しい計算フレームワークにより、人工知能エージェントにおいてこの進化を探求することが可能になりました。

彼らが開発したフレームワークでは、具現化された AI エージェントが何世代にもわたって目を進化させ、見ることを学習します。これは異なる進化的系統樹を再現することを研究者に許す「科学的サンドボックス」のようなものです。ユーザーは、世界の構造や AI エージェントが完了するタスク（例えば、食料を見つけることや物体を見分けることなど）を変更することでこれを行います。

これにより、なぜある動物が単純な光感受性パッチを目として進化させたのか、一方で別の動物が複雑なカメラ型目を進化させたのかを研究することが可能になります。

このフレームワークを用いた研究者の実験は、タスクがいかにエージェントの目の進化を駆動したかを示しています。例えば、彼らはナビゲーションタスクが、昆虫や甲殻類のような多数の個体単位を持つ複眼の進化をもたらすことが多いことを発見しました。

一方、物体識別に焦点を当てた場合、エージェントは虹彩と網膜を持つカメラ型目の進化を起こしやすくなる傾向がありました。

このフレームワークは、実験的に研究することが難しいビジョンシステムに関する「もしも」の問いを科学者が探求することを可能にするでしょう。また、エネルギー効率や製造可能性といった現実世界の制約と性能のバランスを取るロボット、ドローン、ウェアラブルデバイス向けの新しいセンサーやカメラの設計を導くこともできます。

「進化がどのように起こったかというすべての詳細を遡って解明することは決してできませんが、本研究では、ある意味で進化を再現し、これらの異なる方法で環境を探求できる環境を作成しました。この科学を行う方法は、多くの可能性への扉を開きます」と、MIT メディア・ラボの大学院生であり、本論文の共著者であるクシャグラ・ティワリ氏は述べています。

論文の共筆者には、同様に共同第一著者の大学院生アロン・ヤング、大学院生のツォフィ・クリンホーファー、現在はストーニーブルック大学の助教となった元ポスドクのアクシャト・デーブが含まれます。また、脳科学・認知科学部門のエージェント・マクダーモット教授であり、マッゴーヴァイン研究所の研究者、脳・心・機械センターの共同ディレクターであるトマーゾ・ポッジオ氏も名を連ねています。共筆者には、脳・心・機械センターのポスドクであり、カリフォルニア大学サンフランシスコ校への着任が予定されている助教ブライアン・チーング氏、MITメディア芸術科学部の准教授かつカメラカルチャーグループのリーダーであるラメシュ・ラスカル氏も含まれます。さらにライス大学やルンド大学の研究者たちも参加しています。本研究は本日、『Science Advances』誌に掲載されました[https://doi.org/10.1126/sciadv.ady2888]。

科学的なサンドボックスの構築

この論文は、ロボット工学など異なる分野で有用となる新たなビジョンシステムを発見することについて研究者間で議論を始めたことから始まりました。彼らの「もしも」の問いを検証するため、研究者たちはAI を用いて多様な進化の可能性を探ることを決定しました。

「『もしも』という問いは、私が科学を学ぶようになったきっかけでした。AI によって、私たちは通常では答えられないような問いを立てることのできる、具現化されたエージェントを作成するユニークな機会を得ています」とティワリー氏は述べています。

この進化的サンドボックスを構築するために、研究者たちはカメラのすべての要素、すなわちセンサー、レンズ、絞り、プロセッサなどを取り込み、それらを具現化された AI エージェントが学習できるパラメータに変換しました。

彼らはこれらの構成要素を、エージェントが時間をかけて視覚器官を進化させる際に使用するアルゴリズム的学習機構の起点として利用しました。

「宇宙全体を原子レベルでシミュレーションすることはできませんでした。どの材料が必要で、どの材料が不要なのか、そして異なる要素間にどのようにリソースを配分すべきかを決定するのは困難でした」と、チェン氏は述べています。

彼らのフレームワークでは、この進化アルゴリズムは環境の制約とエージェントのタスクに基づいて、どの要素を進化させるかを選択できます。

各環境には単一のタスク（ナビゲーション、食物の識別、獲物の追跡など）が用意されており、これらは動物が生きていくために克服しなければならない実際の視覚タスクを模倣するように設計されています。エージェントは、世界を見渡す単一の光受容体と、視覚情報を処理する関連するニューラルネットワークモデルからスタートします。

その後、各エージェントの一生を通じて、強化学習（reinforcement learning）を用いて訓練が行われます。これは試行錯誤型の手法であり、エージェントがタスクの目標を達成した際に報酬が与えられます。また、環境には、エージェントの視覚センサーに割り当てられるピクセル数などの制約も組み込まれています。

「これらの制約は、設計プロセスを駆動します。これは、光の物理法則のような私たちの世界の物理的制約が、私たち自身の目の設計を駆動してきたのと同じ方法です」とティワリー氏は述べています。

多くの世代にわたって、エージェントは報酬を最大化するビジョンシステムの異なる要素を進化させます。

彼らのフレームワークは、個々の遺伝子がエージェントの発達を制御するように変異する計算による進化の模倣を行うために、遺伝的符号化メカニズムを使用します。

例えば、形態学的遺伝子は、エージェントが環境をどのように見るかを捉え、目の配置を制御します；光学遺伝子は、目が光とどのように相互作用するかを決定し、受容体の数を規定します；そして神経遺伝子は、エージェントの学習能力を制御します。

仮説の検証

研究者がこのフレームワークで実験を設定したとき、タスクがエージェントが進化させたビジョンシステムに大きな影響を与えることを発見しました。

例えば、ナビゲーションタスクに焦点を当てたエージェントは、低解像度センシングを通じて空間認識を最大化するように設計された目を発達させました。一方、物体検出のタスクを与えられたエージェントは、周辺視野よりも前方の視力に重点を置いた目に進化しました。

別の実験では、視覚情報を処理する際に、脳が大きいことが常に良いとは限らないことが示されました。目の受容体の数といった物理的制約に基づき、一度にシステムに入力できる視覚情報の量には限界があります。

「ある時点で、脳が大きくなってもエージェントには全く役立たず、自然界では資源の無駄になります」とチェンは言います。

将来、研究者たちはこのシミュレーターを用いて特定の用途に最適なビジョンシステムを探求し、科学者がタスク固有のセンサーやカメラを開発するのを支援したいと考えています。また、LLM をフレームワークに統合して、ユーザーが「もしも」の質問をより容易に行い、追加の可能性を検討できるようにすることも目指しています。

「より想像力豊かに質問することには、実際に大きな利点があります。私はこれが、特定の分野に限定された狭い問いに焦点を当てるのではなく、はるかに広い範囲の問いに答えようとする、より大規模なフレームワークを他者が創り出すことを促すことを願っています」とチェンは述べています。

本研究は一部、脳・心・機械センターおよび国防高等研究計画局（DARPA）のアルゴリズムとアーキテクチャ発見のための数学（DIAL）プログラムによって支援されました。

原文を表示

Why did humans evolve the eyes we have today?

While scientists can’t go back in time to study the environmental pressures that shaped the evolution of the diverse vision systems that exist in nature, a new computational framework developed by MIT researchers allows them to explore this evolution in artificial intelligence agents.

The framework they developed, in which embodied AI agents evolve eyes and learn to see over many generations, is like a “scientific sandbox” that allows researchers to recreate different evolutionary trees. The user does this by changing the structure of the world and the tasks AI agents complete, such as finding food or telling objects apart.

This allows them to study why one animal may have evolved simple, light-sensitive patches as eyes, while another has complex, camera-type eyes.

The researchers’ experiments with this framework showcase how tasks drove eye evolution in the agents. For instance, they found that navigation tasks often led to the evolution of compound eyes with many individual units, like the eyes of insects and crustaceans.

On the other hand, if agents focused on object discrimination, they were more likely to evolve camera-type eyes with irises and retinas.

This framework could enable scientists to probe “what-if” questions about vision systems that are difficult to study experimentally. It could also guide the design of novel sensors and cameras for robots, drones, and wearable devices that balance performance with real-world constraints like energy efficiency and manufacturability.

“While we can never go back and figure out every detail of how evolution took place, in this work we’ve created an environment where we can, in a sense, recreate evolution and probe the environment in all these different ways. This method of doing science opens to the door to a lot of possibilities,” says Kushagra Tiwary, a graduate student at the MIT Media Lab and co-lead author of a paper on this research.

He is joined on the paper by co-lead author and fellow graduate student Aaron Young; graduate student Tzofi Klinghoffer; former postdoc Akshat Dave, who is now an assistant professor at Stony Brook University; Tomaso Poggio, the Eugene McDermott Professor in the Department of Brain and Cognitive Sciences, an investigator in the McGovern Institute, and co-director of the Center for Brains, Minds, and Machines; co-senior authors Brian Cheung, a postdoc in the Center for Brains, Minds, and Machines and an incoming assistant professor at the University of California San Francisco; and Ramesh Raskar, associate professor of media arts and sciences and leader of the Camera Culture Group at MIT; as well as others at Rice University and Lund University. The research appears today in Science Advances.

Building a scientific sandbox

The paper began as a conversation among the researchers about discovering new vision systems that could be useful in different fields, like robotics. To test their “what-if” questions, the researchers decided to use AI to explore the many evolutionary possibilities.

“What-if questions inspired me when I was growing up to study science. With AI, we have a unique opportunity to create these embodied agents that allow us to ask the kinds of questions that would usually be impossible to answer,” Tiwary says.

To build this evolutionary sandbox, the researchers took all the elements of a camera, like the sensors, lenses, apertures, and processors, and converted them into parameters that an embodied AI agent could learn.

They used those building blocks as the starting point for an algorithmic learning mechanism an agent would use as it evolved eyes over time.

“We couldn’t simulate the entire universe atom-by-atom. It was challenging to determine which ingredients we needed, which ingredients we didn’t need, and how to allocate resources over those different elements,” Cheung says.

In their framework, this evolutionary algorithm can choose which elements to evolve based on the constraints of the environment and the task of the agent.

Each environment has a single task, such as navigation, food identification, or prey tracking, designed to mimic real visual tasks animals must overcome to survive. The agents start with a single photoreceptor that looks out at the world and an associated neural network model that processes visual information.

Then, over each agent’s lifetime, it is trained using reinforcement learning, a trial-and-error technique where the agent is rewarded for accomplishing the goal of its task. The environment also incorporates constraints, like a certain number of pixels for an agent’s visual sensors.

“These constraints drive the design process, the same way we have physical constraints in our world, like the physics of light, that have driven the design of our own eyes,” Tiwary says.

Over many generations, agents evolve different elements of vision systems that maximize rewards.

Their framework uses a genetic encoding mechanism to computationally mimic evolution, where individual genes mutate to control an agent’s development.

For instance, morphological genes capture how the agent views the environment and control eye placement; optical genes determine how the eye interacts with light and dictate the number of photoreceptors; and neural genes control the learning capacity of the agents.

Testing hypotheses

When the researchers set up experiments in this framework, they found that tasks had a major influence on the vision systems the agents evolved.

For instance, agents that were focused on navigation tasks developed eyes designed to maximize spatial awareness through low-resolution sensing, while agents tasked with detecting objects developed eyes focused more on frontal acuity, rather than peripheral vision.

Another experiment indicated that a bigger brain isn’t always better when it comes to processing visual information. Only so much visual information can go into the system at a time, based on physical constraints like the number of photoreceptors in the eyes.

“At some point a bigger brain doesn’t help the agents at all, and in nature that would be a waste of resources,” Cheung says.

In the future, the researchers want to use this simulator to explore the best vision systems for specific applications, which could help scientists develop task-specific sensors and cameras. They also want to integrate LLMs into their framework to make it easier for users to ask “what-if” questions and study additional possibilities.

“There’s a real benefit that comes from asking questions in a more imaginative way. I hope this inspires others to create larger frameworks, where instead of focusing on narrow questions that cover a specific area, they are looking to answer questions with a much wider scope,” Cheung says.

This work was supported, in part, by the Center for Brains, Minds, and Machines and the Defense Advanced Research Projects Agency (DARPA) Mathematics for the Discovery of Algorithms and Architectures (DIAL) program.

この記事をシェア

MIT ML News重要度42026年7月1日 00:30

Q&A：現在のエージェント型 AI とあるべき姿とは何か

MIT ML News2026年6月30日 04:00

MIT 音楽技術研究ショーケースが新大学院生たちの成果を祝う

MIT ML News2026年6月30日 03:00

データ駆動型美学を超えて：MIT の展示会が探る計算と創造の融合

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む