Amazon Nova Forge:「オープントレーニング」パラダイムで誰もが最先端AIを構築可能に
Amazon は、組織固有のデータを活用して基盤モデルをカスタマイズする「Open training」パラダイムを採用した新サービス「Amazon Nova Forge」を発表し、汎用ベンチマークと実運用環境のミスマッチ解消を目指す。
キーポイント
分布ミスマッチの課題解決
公開ベンチマークデータと組織固有の業務データとの分布不一致により、高性能なモデルが現場で機能しないという根本的な課題を指摘し、これを解消する必要性を強調している。
段階的学習による知識保持
Amazon Nova の訓練に使用されたデータと顧客の独自データを各開発段階で混合する手法により、深いドメイン理解を実現しつつ「壊滅的な忘却(catastrophic forgetting)」を防ぐ。
誰でも専門モデルを構築可能
Amazon Nova をベースに、組織が独自のデータとツールを活用して、自分たちのための専門家基盤モデルを構築できる新しいサービスとして「Nova Forge」を提供する。
Novella の概念
Amazon Nova Forge は、組織固有のドメイン知識やワークフローに最適化された「Novella」と呼ばれるモデルバリアントを作成するためのツールとレシピを提供します。
既存手法の限界
従来のクローズドウェイト LLM の LoRA によるファインチューニングは、独自ドメイン知識や複雑なワークフローに対する深い理解を得るには不十分でした。
オープンウェイトモデルの限界と解決策
既存のオープンウェイトモデルは訓練データやレシピが不明なため、特定の用途に最適化すると「壊滅的忘却(catastrophic forgetting)」が発生しやすく、ゼロから構築するには莫大なリソースが必要である。
Nova Forge の新パラダイム
開発の各主要段階(事前学習、中間学習、事後学習)からのチェックポイントへのアクセスと、独自データを Amazon Nova の訓練データに混合する能力を柱とした新しい「オープントレーニング」を実現している。
影響分析・編集コメントを表示
影響分析
この発表は、大規模言語モデルの普及において長年指摘されてきた「ベンチマークと実社会の乖離」という課題に対する、AWS の具体的な解決策を示すものです。特に「壊滅的な忘却」を防ぎながら独自データを学習に組み込む技術的アプローチは、業界全体が直面しているカスタマイズ難易度の壁を下げ、企業による AI 活用を加速させる可能性があります。
編集コメント
単なるモデルの性能向上ではなく、実務で使えないという根本的な課題に対し、訓練プロセスそのものをオープンにするアプローチで解決を図る点は、企業向け AI 戦略において非常に示唆に富んでいます。
Amazon Nova Forge: 誰もが自らの最先端AIを構築できる「オープントレーニング」パラダイム(続き 2/2)
チームはまず、医薬品特許分析においてNova 2 Liteをテストし、一切のカスタマイズなしで95%の精度を達成しました。この印象的な結果は、より野心的な目標、すなわち統一された分子知能システムを構築するためにNova Forgeを使用する自信を与えました。例えば、モデルは、現実的な分子を生成するために原子をどのように結合するかだけでなく、各分子内の特定の構造的特徴が物理化学的特性、生物学的活性、毒性発現基(toxicophore)にどのように対応するかを理解する必要があります。これらの複雑な関係性を理解することは、モデルの構造に関する知識が固まった後に後付けで組み込むことは困難です。
Nova Forgeにより、チームは独自の化学データセットを持ち込み、教師ありファインチューニングと強化学習を用いて性能を向上させることができました。初期の結果では、Nova Forgeで構築されたカスタムモデルが、分子特性予測タスクにおいて他の主要な大規模言語モデル(LLM)を大幅に上回る性能を既に示しており、分子生成へと応用を拡大する可能性を秘めています。これは、これまで以上に迅速により良い医薬品を患者に届けるのに役立つ最先端技術です。
次のフロンティア
私たちは、このオープントレーニングアプローチを通じて、組織がNovaを使用して自らの最先端モデルを構築できる初のサービスとしてAmazon Nova Forgeをリリースしました。
私たちが最近Nova 2 Liteと他の3つのNova 1モデルで立ち上げた機能は、私が先に概説した2つの課題に対応しています。私たちは現在、新たに顕在化している課題、すなわち、既存のカスタマイズ済みNovaモデルから新しくリリースされたNovaモデルへ知識を移行するのに必要な時間と労力を削減することに取り組んでいます。
その目的のために、私たちは内部チームに提供すると同時に、より高性能なモデルであるNova 2 ProをForgeのお客様に早期アクセスとして提供しています。Forgeのお客様は、Amazon BedrockでNova 2 Proをすぐに使用してアプリケーションを構築できます。数週間後には、Nova 2 Proの複数のチェックポイントを用いたトレーニングレシピを提供する予定です。Forgeにおいて、さらに強力なモデルへのこのような早期アクセスにより、組織はより新しく高性能なNovaモデルへの知識移行を事前に計画しやすくなります。
私たちのオープントレーニングアプローチは、より広範な研究コミュニティが基礎的な研究課題を探求するのを容易にし、それが私がNova Forgeの可能性に興奮するもう一つの理由です。オープンソースソフトウェアが現代のインターネットを可能にしたように、オープントレーニングは、あらゆる組織が自らの最先端AIを構築できる未来を可能にするかもしれません。
要するに
私はNova 2 LiteにNova Forgeの説明を与え、お客様向けの一文要約を求めました。Nova 2 Liteは「Nova Forge: あなたのAI、あなたのルール。より速く、より賢く、あなたの条件で構築」と返してきました。私たちがここで達成しようとしている精神、すなわち、あらゆる規模と専門知識を持つ組織がその領域で優れ、AIで価値を提供するのを支援するという精神を要約するのに、これ以上の言葉はないでしょう。
研究分野: 会話型AI
タグ: Amazon Nova, 大規模言語モデル(LLM)
原文を表示
Amazon Nova Forge: "Open training paradigm that empowers everyone to build their own frontier AI
New service lets customers mix their own data with the data used to train Amazon Nova at each major stage of model development, enabling deep domain understanding while preventing "catastrophic forgetting".
Conversational AI
Rohit Prasad December 08, 02:41 PM December 08, 02:41 PM As foundation models (FMs) such as Transformer-based large language models (LLMs) have grown in popularity, there is a pattern we have seen repeatedly: a new model launches with stunning benchmark scores, teams get excited and start testing, and then they hit production reality. The model that aced the public benchmarks struggles with specific use cases organizations want to enable.
This is because the public benchmarks are in the probability distribution of the data used to train the models, whereas the use cases the organizations are interested in are out of distribution." This distribution mismatch happens for two main reasons:
The application depends on data, knowledge, and tools secured within an organization; these assets are not part of the public datasets used to train LLMs.
Customer behavior and the application context keep evolving, so the new model is obsolete on the day it is deployed.
A few months back, we asked how we could meet these fundamental challenges. Our front-row seats to diverse, large-scale application-development efforts within Amazon helped us invent a whole new service called Amazon Nova Forge that empowers organizations to build their own expert foundation models using Amazon Nova.
image Rohit Prasad, senior vice president of Artificial General Intelligence, Amazon In essence, Nova Forge gives you the training tools and recipes to make your differentiated use cases become in distribution, so your application can meet the highest standards of accuracy, reliability, cost effectiveness, and control. The result is a model that knows your organization and use cases as an expert in your domain. We call this model a Novella a variant of Nova that is optimized for your organization.
Before Amazon Nova Forge
Historically, organizations have had three suboptimal choices for mitigating the challenges I described above.
First, they could fine-tune closed-weights LLMs using APIs that are typically based on low-rank adapters (LoRA). But such limited adaptation cannot give the customized model a deep understanding of proprietary domain knowledge and complex workflows.
Second, they could continue pretraining a base open-weights model or continue post-training one that is already instruction tuned for a set of use cases. But open-weights models do not come with the data used to train them or with their exact training recipes e.g., how many training epochs, on which datasets, and at what learning rates. Consequently, it is extremely difficult to steer them to particular use cases without regressing on the core properties of the base model, a phenomenon known as catastrophic forgetting.
Third, they could build a frontier-scale model from scratch, but that requires massive computational resources, expert developers, and time.
Nova Forge, by contrast, is built on an entirely new paradigm of open training and has two main pillars: access to checkpoints from each major stage of model development and the ability to mix proprietary data with the data curated for training Amazon Nova.
Access to checkpoints from each major stage of model development
Most state-of-the-art foundation models are trained in three stages. First is pretraining, where the model is trained to predict the next token (i.e., unit of the LLMs vocabulary, such as a word or a word part) in a sequence of tokens using large quantities of unlabeled data.
Second is mid-training, where real-world and synthetic user-system interactions (traces) help improve the models performance on a prioritized set of applications and tasks while increasing (or at least preserving) generalizability to previously unseen tasks. Mid-training is like pretraining, except that the data is specific to a set of tasks that the model provider wants the model to excel at, and the learning rate (i.e., how much a given training example modifies the model) is different.
Third is post-training, including supervised fine-tuning (SFT), where the model learns to complete tasks from curated demonstrations and instructions (e.g., from software engineering), and reinforcement learning (RL), which helps improve accuracy on these tasks and align the models outputs to specific policies.
Depending on the complexity of the target application and the relative importance of historical data and ongoing usage, organizations need the ability to infuse their data and knowledge into one or all of these stages. This is why Nova Forge provides three model checkpoints pretrained, mid-trained, and post-trained and the recipes and code to continue training from any of them.
If you are working in a novel domain that is not represented in the pretraining data at all (e.g., geospatial or radiology images) and have many trillions of tokens, you can continue pretraining from the pretrained checkpoint. If you have a few billion to a few trillion tokens of historical data or can synthesize interactions, you can continue from mid-trained checkpoints. You can also perform SFT and RL on the mid-trained checkpoints. Lastly, the most common use case is to continually update the model using RL from real-world feedback or synthetic data.
Mixing proprietary and frontier data
Foundation models with frontier capabilities come from frontier-scale data. While techniques such as regularization and carefully crafted learning rates can help mitigate the challenges of catastrophic forgetting, the best way to infuse new knowledge into a model without losing existing capabilities is to mix frontier-scale data with your own proprietary data.
This is why, for all stages of training, Nova Forge provides API-based mixing of the high-quality curated data used to train our frontier models with your proprietary data. To the best of our knowledge, no proprietary FM provider or even open-weights-model developer has provided the ability to mix frontier-scale data with proprietary data during pretraining, mid-training, and post-training.
When organizations blend their proprietary data with high-quality curated data at early stages, they achieve something fundamentally different from customization choices that were available before Nova Forge: they build models where expertise in their domain is the core capability of the model, not an afterthought. The model learns to reason about domain-specific concepts as fluently as it reasons about the general knowledge available in public sources.
Consider the experience of Nimbus Therapeutics, a clinical-stage drug discovery company, when building an AI system to accelerate molecular design. Drug discovery requires finding the right balance of many properties within a single molecule. It is an exponentially complex task that cannot be solved by manual exploration of candidate combinations. The goal was to build a model that could generate molecular designs, reason through complex problems, and predict which molecules are worth testing in the lab, where each experiment can cost thousands of dollars.
Off-the-shelf LLMs lacked the deep understanding of chemistry required for such specialized work. While Nimbus had already built a suite of specialized machine learning models to address this gap, these models still lacked true chemical-reasoning capabilities, and maintaining a collection of separate models had become increasingly complex and resource intensive.
The team began by testing Nova 2 Lite on pharmaceutical-patent analysis, where it achieved 95% accuracy without any customization. This impressive result gave them confidence to use Nova Forge for a more ambitious goal: creating one unified molecular-intelligence system. For instance, a model needs to understand not just how to connect atoms to make a realistic molecule but how specific structural features in each molecule map to physico-chemical properties, biological activities, and toxicophores. A grasp of these complex relationships is difficult to bolt on after a model's knowledge of structures has solidified.
Nova Forge enabled the team to bring in its own proprietary chemistry datasets and drive performance improvement using supervised fine-tuning and reinforcement learning. Early results show that the custom model built using Nova Forge already outperforms other leading LLMs on molecular-property prediction tasks by significant margins, with the promise of expanding into molecular generation a cutting-edge technology that will help bring better medicines to patients more quickly than ever before.
The next frontier
We released Amazon Nova Forge as the first service that enables organizations to build their own frontier models with Nova, through this open training approach.
The capabilities we recently launched with Nova 2 Lite and three other Nova 1 models address the two challenges I outlined earlier. We are now working to meet an emerging challenge reducing the time and effort required to transfer knowledge from an existing, customized Nova model to a newly released Nova model.
To that end, we are offering Forge customers early access to a more capable model, Nova 2 Pro, at the same time that we are providing it to our internal teams. Forge customers can use Nova 2 Pro in Amazon Bedrock right away to build their applications. In a few weeks, we will provide recipes for training from multiple checkpoints of Nova 2 Pro. Such early access to even more powerful models in Forge makes it easy for organizations to plan ahead for the transfer of knowledge to newer, more capable Nova models.
Our open-training approach also makes it easy for the broader research community to explore fundamental research questions and it is another reason I am excited by the potential of Nova Forge. Just as open-source software enabled the modern Internet, open training may enable a future where every organization can build its own frontier AI.
The so what
I gave Nova 2 Lite a description of Nova Forge and asked for a one-sentence summary for our customers. Nova 2 Lite came back with Nova Forge: Your AI, your rulesbuilt faster, smarter, and on your terms. I could not have done a better job of summarizing the spirit of what we are trying to accomplish here, helping organizations of all sizes and expertise excel in their domains and deliver value with AI.
Research areas: Conversational AI
Tags: Amazon Nova, Large language models (LLMs)
関連記事
今日のまとめ
AI日報で今日の重要ニュースをまとめ読み