読み込み中…

MIT ML News·2026年6月3日 13:00·約8分

MIT の研究者が AI モデルにチャートの解釈を教示

#Vision-Language Models #ChartNet #Open Source AI #Data Generation #MIT-IBM

TL;DR

MIT と IBM の共同研究チームは、チャート解釈能力を劇的に向上させた大規模データセット「ChartNet」を開発し、小規模なオープンソースモデルが大手商用モデルを上回る性能を発揮することを実証した。

AI深層分析2026年6月3日 13:51

重要/ 5段階

深度40%

キーポイント

ChartNet データセットの構築

視覚、数値、言語情報を統合的にエンコードした 100 万枚以上の多様なチャートを含む新データセット「ChartNet」を公開し、VLM の学習基盤を提供する。

小モデルによる商用モデルの凌駕

ChartNet で訓練されたオープンソースの比較的小さな VLM が、データ抽出や要約タスクにおいて、桁違いに大きな規模の商用モデルを凌ぐ性能を示した。

低コストでの AI 活用実現

計算リソースを大量に必要としない小規模モデルでも SOTA（最良）性能が達成可能となり、予算制限のある中小企業や研究機関による高度な AI 利用を促進する。

MIT と IBM の共同開発

MIT と MIT-IBM コンピューティング研究所、IBM リサーチの研究者らが協力し、IEEE CVPR 2026 で発表予定の研究として、チャート理解のワンストップリソースを確立した。

重要な引用

We developed ChartNet to be a one-stop shop for chart understanding, covering basically anything that an AI model and a practitioner who is training that model might need.

Many of these smaller models significantly outperformed orders of magnitude larger, commercial models on tasks like data extraction and chart summarization.

影響分析・編集コメントを表示

影響分析

この研究は、チャート分析という複雑なタスクにおいて、データセットの質と構成がモデルの性能を決定づける重要な要素であることを浮き彫りにしました。特に、大規模計算リソースに依存しないオープンソースモデルによる高性能化は、AI の民主化と中小企業への普及を加速させる画期的な進展です。今後は、ビジョン・ランゲージモデルの標準的なベンチマークや学習手法が、この ChartNet を基盤として再定義される可能性があります。

編集コメント

「モデルのサイズが全てではない」という示唆に富む成果であり、データセットの設計思想が AI 性能を左右する現代において、極めて重要な指針となるでしょう。

急速に変化するグローバル市場において意思決定を加速し、洗練させるために、企業は生成型人工知能モデルを導入し、市場サマリーや財務報告書に頻繁に掲載されるチャートを要約・解釈する支援を行おうとしています。

しかし、最新のビジョン言語モデルでさえもこのタスクに苦戦することがあり、それはモデルが視覚的、数値的、そして言語的な理解を統合する必要があるためです。最先端のモデルに投資した企業であっても、不正確または不完全な情報を得る可能性があります。

この性能ギャップを埋めるために、MIT と MIT-IBM 計算研究ラボ（Computing Research Lab）の研究者たちは、ビジョン言語モデル（VLMs: Vision-Language Models）がチャートを効果的に解釈する方法を教えることを目的とした、AI ユーザー向けの多面的なリソースを開発しました。

彼らは、100 万枚を超える多様なチャートを含む最先端のデータセットを構築するために、新しいデータ生成手法を用いました。このデータセットには各チャート画像の多くの視覚的、言語的、数値的コンポーネントがエンコードされており、これによりモデルはチャート内の情報について堅牢な推論を行うことが可能になります。

研究者たちは、ChartNet と名付けられたこのデータセットを用いて、一連のオープンソース VLMs を訓練しました。これらの多くの小規模モデルは、データ抽出やチャートの要約といったタスクにおいて、桁違いに大きな商用モデルを大幅に上回る性能を示しました。

オープンソースモデルが商用モデルを上回る性能を発揮できるようにすることで、ChartNet は予算に制約のある中小企業でも AI をより容易に活用できるようになる可能性があります。このオープンソースデータセットは、ビジネス動向の分析や科学図表の解釈といったタスクにおける AI モデルの能力向上に利用できます。

「ChartNet は、AI モデルとそれを訓練する実践者が基本的に必要とするあらゆるチャート理解機能を網羅したワンストップショップとして開発しました。私たちの研究が、無限の計算リソースを必要としない小型モデルでも最先端のパフォーマンスを実現しようとする研究者たちのモチベーションにつながれば幸いです」と、MIT の電気工学・コンピュータサイエンス（EECS）専攻大学院生であり ChartNet に関する論文 ChartNet に関する論文の筆頭著者である Jovana Kondic は述べています。

論文の共著者には、MIT、MIT-IBM コンピューティング研究ラボ、IBM リサーチから多くの研究者が名を連ねています。具体的には、IBM リサーチの研究スタッフである彭元李（Pengyuan Li）、IBM リサーチのシニアサイエンティストであるディラージ・ジョーシー（Dhiraj Joshi）、IBM リサーチのソフトウェアエンジニアであるアイザック・サンチェス（Isaac Sanchez）、MIT シュワルツマン計算科学科大学の戦略的産業連携担当ディレクター、MIT-IBM コンピューティング研究ラボのディレクター、およびコンピュータサイエンス人工知能研究所（CSAIL）のシニア研究員であるオーデ・オリヴァ（Aude Oliva）、そして MIT-IBM コンピューティング研究ラボのプリンシパルサイエンティスト兼マネージャーであるロジェリオ・フェリス（Rogerio Feris）です。本研究は、IEEE コンピュータビジョンパターン認識会議で発表されます。

データセットのボトルネック

自然言語処理や自然画像に関する推論に優れた生成 AI モデルの開発において研究者たちは大きな進歩を遂げました。しかし、チャートに含まれる複雑なマルチモーダルデータの解釈については、コンディック氏によると、取り組みがまだ少ないのが実情です。

しかしながら、ほぼすべての業界における大企業から中小企業に至るまで、チャートの理解は重要なタスクとなっています。

「金融業界はチャートに支えられています。ビジョン・言語モデルがチャートからトレンドの記述などの情報を抽出できれば、その後の多くのワークフローを円滑に進めることができます」とジョーシー氏は述べています。

高精度なトレーニングデータの不足は、チャートを正確に解釈できるビジョン・ランゲージモデル（VLM）の開発を阻む主要なボトルネックとなっています。多くのデータセットにはインターネットから収集された限られた数のチャート画像が含まれており、モデルが背後にあるデータを解釈するために必要な規模や追加情報が欠けていることが多々あります。

「脳とは異なり、ビジョン・ランゲージモデルは、何かが折れ線グラフであることを確実に認識するために、トレーニング中に数千の例を見る必要があるかもしれません」とコンディック氏は述べています。

研究者たちはこれらの欠点を克服するため、合成データ（Synthetic Data）の生成を目指しました。合成データとは、実際のデータの統計的特性を模倣するようにアルゴリズムによって人工的に生成されたデータを指します。

ChartNet データセットには、100 万枚以上の高品質なチャート画像に加え、各チャートを生成するために使用された対応するコード、テキストによる説明、および数値情報を含むテーブルが含まれています。さらに、各データポイントには、モデルがチャート画像に関する質問に正しく回答する方法を教えるための質問と回答のペアも含まれています。

「これらの追加的なデータ形式は、モデルに対して、チャート画像が符号化する異なる情報の断片をつなぎ合わせ、整合性を持たせるよう導きます」とコンディック氏は述べています。

データ生成

ChartNet を構築するために、研究者たちは 2 つのステップからなる合成データ生成パイプラインを作成しました。

まず、彼らの自動化システムは既存のチャート画像セットをすべてコードに変換します。その後、このシステムはそのコードを反復的に拡張し、チャートの種類、データ値、トピック、色など、各チャートの異なる側面を変更していきます。

「シードとして使用する単一のチャートから始めて、その数百の拡張版を作成できます。これにより、100 万枚を超える多様な画像を含むデータセットを構築することができました」と Kondic は説明します。

また、合成データの品質を保証するために、自動化された品質チェックプロセスも組み込まれています。このプロセスでは、コードが実行可能であることと、レンダリングされたチャート画像が正確でクリーンであることを検証します。

「単に多様なサンプルを生成するだけでなく、情報を意味のある形で提示することも望んでいます」と彼女は述べています。

ChartNet には、人間による専門家アノテーションが付与されたチャートデータポイントの選択も含まれています。これにより、有効性の保証を持つ追加的な種類のチャートとサポートデータのアクセスが可能になります。

Joshi はさらに、専門家はこれらのアノテーション付きデータを使用して既存の VLM（Vision-Language Model：視覚言語モデル）をファインチューニングし、特定のアプリケーションにおけるパフォーマンスをさらに向上させることができると付け加えています。**

研究者たちは ChartNet をテストするために、IBM の Granite Vision シリーズのモデルや、さまざまなサイズの複数のオープンソースモデルをトレーニングし、各種チャート解釈タスクで評価を行いました。このデータセットは、すべてのモデルにおいて、チャートの再構築、チャートデータの抽出、チャートの要約、およびチャートに関する質問への回答という各タスクの精度を向上させました。

ChartNet により、小規模なオープンソースモデルが一貫して、はるかに大規模な商用モデルを上回る結果を示しました。

「従来の多くのトレーニングデータセットは、チャートに関する単純な質問に答えることに焦点を当てていました。Kondic氏は、『ChartNet では、堅牢なチャートの理解のすべての側面をサポートするデータを生成することで、その枠を超えようとしています』と述べています。

将来、研究者たちは、複雑さのレベルを追加したデータを取り入れることで ChartNet の拡大を継続する計画です。また、研究コミュニティからのフィードバックも取り入れたいと考えています。

本研究は、MIT-IBM 計算研究所（MIT-IBM Computing Research Lab）の一部資金によって支援されました。

原文を表示

To accelerate and refine decision-making in a fast-paced, global marketplace, enterprises may deploy generative artificial intelligence models to help summarize and interpret the charts that often fill market summaries and financial reports.

But even the latest vision-language models sometimes struggle with this task, since it requires a model to integrate visual, numerical, and linguistic understanding. A company that invests in a state-of-the-art model might still receive inaccurate or incomplete information.

To fill this performance gap, researchers from MIT and the MIT-IBM Computing Research Lab developed a multifaceted resource for AI users that is specifically designed to teach vision-language models (VLMs) how to effectively interpret charts.

They used a novel data generation method to build a state-of-the-art dataset that includes more than a million varied charts. The dataset also encodes many visual, linguistic, and numerical components of each chart image, which enable models to robustly reason about the information in a chart.

The researchers used this dataset, called ChartNet, to train a series of open-source VLMs. Many of these smaller models significantly outperformed orders of magnitude larger, commercial models on tasks like data extraction and chart summarization.

By enabling open-source models to outperform their commercial counterparts, ChartNet could allow small firms with limited budgets to more readily utilize AI. The open-source dataset can be used to improve the capabilities of AI models for tasks like business trend analysis and scientific figure interpretation.

“We developed ChartNet to be a one-stop shop for chart understanding, covering basically anything that an AI model and a practitioner who is training that model might need. We hope our work motivates researchers to achieve state-of-the-art performance with smaller models that don’t require infinite amounts of computation,” says Jovana Kondic, an MIT electrical engineering and computer science (EECS) graduate student and lead author of a paper on ChartNet.

She is joined on the paper by many co-authors from MIT, the MIT-IBM Computing Research Lab, and IBM Research, including Pengyuan Li, a research staff member at IBM Research; Dhiraj Joshi, a senior scientist at IBM Research; Isaac Sanchez, a software engineer at IBM Research; Aude Oliva, director of strategic industry engagement at the MIT Schwarzman College of Computing, MIT director of the MIT-IBM Computing Research Lab, and a senior research scientist in the Computer Science and Artificial Intelligence Laboratory (CSAIL); and Rogerio Feris, a principal scientist and manager at the MIT-IBM Computing Research Lab. The research will be presented at IEEE Computer Vision and Pattern Recognition Conference.

A dataset bottleneck

Researchers have made great strides developing generative AI models that excel at natural language processing and reasoning about natural images. But less work has focused on interpreting complex multimodal data contained within charts, Kondic says.

Yet for large and small businesses in nearly every industry, chart understanding is a critical task.

“The finance industry thrives on charts. If vision-language models can extract information out of charts, like descriptions of trends, that facilitates a lot of workflows that happen downstream,” Joshi says.

The lack of high-quality training data is a major bottleneck holding back the development of VLMs that can accurately interpret charts. Many datasets contain limited chart images pulled from the internet and often lack the necessary scale and additional information to help a model interpret the underlying data.

“A vision-language model, unlike our brains, may need to see thousands of examples during training to reliably recognize something as a line chart,” Kondic says.

The researchers sought to overcome those shortcomings by generating synthetic data. Synthetic data are artificially generated by algorithms to mimic the statistical properties of actual data.

The ChartNet dataset holds more a million high-quality chart images, along with the corresponding code used to generate each chart, a textual description, and a table that contains its numerical information. In addition, each datapoint includes question-and-answer pairs to teach the model how to correctly answer questions about the chart image.

“These additional modes of data guide the model to connect and align the different pieces of information that the chart image encodes,” Kondic says.

Data generation

To build ChartNet, the researchers created a two-step, synthetic data generation pipeline.

First, their automated system translates any pre-existing set of chart images into code. Then the system iteratively augments that code to change different aspects of each chart, such as chart type, data values, topic, colors, etc.

“We can start from a single chart that we use as a seed and come up with hundreds of augmentations of it. This is how we were able to build a dataset with more than a million diverse images,” Kondic explains.

They also incorporated an automated quality check process to ensure the synthetic data are high quality. This process verifies that the code is executable and rendered chart images are accurate and clean.

“We don’t want to just be generating diverse samples. We also want the information to be presented in a meaningful way,” she says.

ChartNet also includes a selection of chart datapoints annotated by human experts. This provides access to additional types of charts and supporting data that carry validity guarantees.

A practitioner could use the annotated data to fine-tune an existing VLM, further boosting performance for a specific application, Joshi adds.

The researchers tested ChartNet by training IBM’s Granite Vision series of models as well as several other open-source models of various sizes and evaluating them on various chart interpretation tasks. The dataset improved the accuracy of all models in chart reconstruction, chart data extraction, chart summarization, and chart question answering.

With ChartNet, small open-source models consistently outperformed much larger commercial models.

“A lot of prior training datasets only focused on answering simple questions about a chart. We tried to go beyond that with ChartNet by generating data that support all aspects of robust chart understanding,” Kondic says.

In the future, the researchers plan to continue expanding ChartNet by incorporating data with added levels of complexity. They also want to draw on feedback from the research community.

This research was funded, in part, by the MIT-IBM Computing Research Lab.

この記事をシェア

MIT ML News重要度42026年7月16日 13:00

MIT、2D設計から3Dモデルへ自動変換システム開発

MIT ML News重要度42026年7月16日 05:25

MIT、AI の内部を可視化する「ニューラル・トランスペアレンシー」を発表

MIT ML News重要度42026年7月15日 03:00

AI はジェットエンジンを設計できるか？JARVIS チャレンジが AI コパイロットの役割を検証

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む