読み込み中…

NVIDIA Developer Blog·2026年4月10日 00:00·約1分

プロテオーム規模でのタンパク質構造予測を加速する方法

#構造生物学AI #タンパク質構造予測 #プロテオーム解析 #ハイパフォーマンスコンピューティング #GPUコンピューティング #計算創薬

TL;DR

NVIDIAの開発者ブログは、タンパク質が単独ではなく複合体として機能することを前提に、プロテオーム規模でのタンパク質構造予測を高速化する方法について解説している。

AI深層分析2026年4月10日 01:42

重要/ 5段階

深度40%

キーポイント

タンパク質複合体予測の重要性

記事は、多くの生物学的プロセスが単体のタンパク質ではなく、他のタンパク質と相互作用する複合体によって支配されていることを指摘し、この複合体の構造予測が重要であると述べている。

プロテオーム規模での高速化アプローチ

記事の核心は、個々のタンパク質だけでなく、プロテオーム（生物が持つ全タンパク質のセット）規模で、これらの複合体の構造予測をいかに加速するかという技術的アプローチに関するものである。

NVIDIA技術プラットフォームの応用

NVIDIA Developer Blogというソースから、この高速化にはGPUや関連するAI/ハイパフォーマンスコンピューティング技術の活用が想定され、実用的な計算基盤の提供が背景にある。

重要な引用

Proteins rarely function in isolation as individual monomers.

Most biological processes are governed by proteins interacting with other proteins, forming...

影響分析・編集コメントを表示

影響分析

この記事は、AI駆動の構造生物学が、画期的な単一タンパク質予測（AlphaFold2など）の成功から、より現実的な生物学的システム（プロテオーム規模の複合体）の理解と予測という次の段階に進みつつあることを示唆している。計算需要の飛躍的増大は、ハイパフォーマンスコンピューティングとAIの融合領域における重要なビジネス・研究機会を創出する。

編集コメント

AlphaFold2以降の構造生物学AIの進化の方向性を、計算基盤の観点から具体的に示す良質な技術解説。次のブレークスルーはスケーラビリティにあるとのメッセージ性が強い。

imageタンパク質が単体のモノマーとして孤立して機能することは稀です。ほとんどの生物学的プロセスは、タンパク質が他のタンパク質と相互作用し、複合体を形成することで制御されています...

原文を表示

Proteins rarely function in isolation as individual monomers. Most biological processes are governed by proteins interacting with other proteins, forming protein complexes whose structures are described in the hierarchy of protein structure as the quaternary representation.

This represents one level of complexity up from tertiary representations, the 3D structure of monomers, which are commonly known since the emergence of AlphaFold2 and the creation of the Protein Data Bank.

Structural information for the vast majority of complexes remains unavailable. While the AlphaFold Protein Structure Database (AFDB), jointly developed by Google DeepMind and EMBL’s European Bioinformatics Institute (EMBL-EBI), transformed access to monomeric protein structures, interaction-aware structural biology at the proteome scale has remained a bottleneck with unique challenges:

Massive combinatorial interaction space

High computational cost for multiple sequence alignment (MSA) generation and protein folding

Inference scaling across millions of complexes

Confidence calibration and benchmarking

Dataset consistency and biological interpretability

In recent work, we extended the AFDB with large-scale predictions of homomeric protein complexes generated by a high-throughput pipeline based on AlphaFold-Multimer—made possible by NVIDIA accelerated computing. Additionally, we predicted heteromeric complexes to compare the accuracy of different complex prediction modalities.

In particular, for the predictions of these datasets, we leveraged kernel-level accelerations from MMseqs2-GPU for MSA generation, and NVIDIA TensorRT and NVIDIA cuEquivariance for deep-learning-based protein folding. We then mapped the workload to HPC-scale inference by maximizing the utilization of all available GPUs, including scale-out to multiple clusters.

This blog describes the major principles we adopted to increase protein folding throughput, from adopting libraries and SDKs to optimizations to reduce the computational complexity of the workload. These principles can help you set up a similar pipeline yourself by borrowing from the techniques we used to create this new dataset.

So, if you are a:

Computational biologist scaling structure prediction pipelines

AI researcher training generative protein models

HPC engineer optimizing GPU workloads

Bioinformatician team building structural resources

You will learn how to:

Design a proteome-scale complex prediction strategy

Separate MSA generation from structure inference for efficiency

Scale AlphaFold-Multimer workflows across GPU clusters

Prerequisites

Technical knowledge

Python and shell scripting

SLURM as HPC workload scheduler

Basic structural biology

Familiarity with AlphaFold/ColabFold/OpenFold or similar pipelines

Infrastructure

We describe scaling on a multi-GPU and multi-node NVIDIA DGX H100 Superpod cluster

This cluster includes high-speed storage to store MSAs and intermediate outputs

Software

Access to MMseqs2-GPU

Familiarity with TensorRT

If not using a model with integrated cuEquivariance, knowledge about triangular attention and multiplication operations

Procedure/Steps

Define the dataset you’d like to compute

Begin by defining the scope of prediction. Because predicting protein complexes can become a combinatorial problem, it’s useful to understand what may be most interesting. In some cases, if your proteomes are small enough, an all-against-all (dimeric) complex prediction might be tractable; however, this could change if you want to predict large datasets of proteomes. Here’s how we decided to go about it:Homomeric complexes: We selected all proteomes represented in the AFDB and sorted them by perceived importance (e.g., proteomes of human concern or commonly accessed). This allowed us to rank proteomes for computation in a particular order, making execution more manageable.

Heteromeric complexes: This is where things can get complicated, fast. For our heteromeric runs, we decided to focus on complexes originating from several reference proteomes and proteomes included in the WHO list of important proteomes. As there’s an intractable number of combinations of complexes that can be derived from these proteomes, for our runs, we focused on dimers (complexes of two proteins), within the same proteome (no inter-proteome complexes) that had “physical” interaction evidence in STRING. As we sought coverage, we decided to consider all interactions reported in STRING for these proteomes, rather than further filtering. Evidence in the literature suggests that filtering for STRING scores >700 can further reduce the number of inputs while increasing the likelihood of well-predicted complexes.

Decoupling MSA generation from structure prediction

MSA generation and structure inference are both compute-intensive but scale differently, as we recently presented in a white paper. We thus approached these computations as separate steps and implemented separate SLURM pipelines. In general, for optimal use of a node, we set up MSA generation and structure prediction this way.

MSA generation

We generated MSAs using colabfold_search with the MMseqs2-GPU backend. While MMSeqs2-GPU scales across GPUs on a node natively, we chose to spawn one MMseqs2-GPU server process per GPU on a node for easier process management. In colabfold_search, the GPUs are only used for the ungappedfilter stages and not the subsequent alignment stages (which are multithreaded CPU processes). Therefore, we can stack colabfold_search calls and start the next one once the GPU is no longer used by the previous one, by monitoring the colabfold_search output, to reduce GPU idle time. Although this approach oversubscribes CPU resources, in practice, we found that on a DGX H100 node, up to 25% of the overall increase in throughput can be achieved with three staggered colabfold_search processes, at the expense of slower processing of individual input chunks.

On determining reasonable input chunk sizes, there are two factors to consider. Smaller chunk sizes result in more chunks, which means more per-process overheads, such as database loading, which can take a couple of minutes each, even on fast storage. (Pre-staging the databases on the fastest storage available, such as the on-node SSD, helps with throughput as well.) On the other hand, larger chunks take more time to finish. On a SLURM cluster with a job time limit, this results in more unfinished chunks. The sweet spot will depend on the cluster configuration, but for our DGX H100 node with a 4-hour wall time limit, the chunk size of 300 sequences seemed to work well with the staggering colabfold_search approach.

Structure prediction

In order to increase structure prediction throughput, we leveraged both optimizations in data handling for JAX-based folding through ColabFold, as well as accelerated tooling developed at NVIDIA, including TensorRT, and cuEquivariance for OpenFold-based folding.

Deep learning inference parameters

First, we selected inference parameters that struck a good balance between accuracy and speed. Protein inference setup for all deep learning inference pipelines (ColabFold and OpenFold), thus utilized:

Weights: 1x weights from AlphaFold Multimer (model_1_multimer_v3)

Four recycles (with early stopping)

No relaxation

MSAs: frozen MSAs generated through ColabFold-search (using MMseqs2-GPU), as described above

Accuracy validation

Homodimer PDB set (125 proteins)ModelHighMediumAcceptIncorrUsableDockQDockQ>0.8>0.6>0.3>0ColabFold5237122189(72.95%)0.637OpenFold with TensorRT and cuEquivariance5339102092(75.41%)0.647Table 1. A comparison of interface accuracy between ColabFold and OpenFold (accelerated by TensorRT and cuEquivariance) across a benchmark set of 125 homodimer proteins.

As we used different inference pipelines, we performed accuracy validation using a curated benchmark set of 125 X-ray resolved PDB homodimers released after AlphaFold2 was introduced, thus minimizing the potential for information leakage. Predicted complexes for each deep learning implementation were compared against experimental reference structures using DockQ, which evaluates interface accuracy via the fraction of native contacts (Fnat), fraction of non-native contacts (Fnonnat), interface RMSD (iRMS), and ligand RMSD after receptor alignment (LRMS), and assigns standard CAPRI classifications of high, medium, acceptable, or incorrect.

Across the PDB homodimer benchmark, OpenFold accelerated through TensorRT and cuEquivariance reproduces ColabFold interface accuracy, achieving a similar fraction of “high” scoring predictions and comparable mean DockQ scores. This indicates that the accelerated implementations preserve interface-level structural accuracy relative to the ColabFold baseline.

MSA preparation and sequence packing

For ColabFold-based homodimer inferences, higher throughput can be achieved by packing homodimers of equal length into a batch for processing, sorted by their MSA depth in descending order. This reduces the number of JAX recompilations, thereby increasing end-to-end throughput. This trick, however, does not work when processing heterodimers, because the lengths of the individual chains differ.

For OpenFold, whether for homodimers or heterodimers, this packing strategy is not needed, as the method doesn’t require re-compilation. However, given a dependency between sequence length and execution time, reserving longer sequences for individual jobs may be beneficial if operating with specific SLURM runtimes. To further optimize the process, input featurizations (CPU-bound) were performed for the next input query alongside the inference step for the current query (GPU-bound).

Additionally, OpenFold’s throughput was enhanced through the integration of the NVIDIA cuEquivariance library and NVIDIA TensorRT SDK. These modular libraries and SDKs can be leveraged to accelerate operations common in protein structure AI and general inference AI workloads, respectively. We previously described how TensorRT can be leveraged to accelerate OpenFold inference.

Optimize GPU utilization with SLURM

As alluded to in the previous section, depending on the available hardware, you can increase throughput by “packing” GPUs and nodes. SLURM is a great orchestrator, and we divided the inference workflows in SLURM scripts to:

Pack multiple predictions per node

Match GPU memory to sequence length

Reduce idle time between jobs

Separate short vs long sequence queues

Our workload was mapped to a H100 DGX Superpod HPC system. We could thus deploy inference across NVIDIA H100 GPUs on multi-node clusters, leveraging exclusive execution on a single node, and packing each GPU with as many processes as saturated the GPU utilization for both MSA processing and deep learning inference.

Helpful tips:

Group jobs by total residue length

Monitor GPU memory fragmentation

Use asynchronous I/O to avoid disk bottlenecks

Making quality predictions accessible to the world

In partnership with EMBL-EBI, the Steineggerlab at Seoul National University, and Google DeepMind, we explored complex structure prediction analysis. We highlight that predicting these biological systems remains challenging. Unlike protein monomer prediction, where predicted Local Distance Difference Test (pLDDT) can inform overall prediction quality, yielding a balanced amount of plausible predictions, in the complex scenario, assessing interface plausibility is much harder. This has to do with the fact that assessing complexes involves global and per-chain confidence metrics, as well as local confidence metrics at the interface.Simply put, is the interface between two monomers plausible, and is it predicted in the right pocket? These questions are much harder to answer than more “local” questions about monomer likelihood, given the very limited data available. Therefore, we make available a set of high-confidence structures through the AlphaFold Database, thereby enabling, for the first time, exploration of protein complexes. We intend to refine our approach further and expand the universe of available protein complexes in the AlphaFold Database.

Getting started

Proteome-scale quaternary structure prediction requires more than just running AlphaFold-Multimer at scale. Success depends on:

Evidence-driven interaction selection

Decoupled and optimized compute workflows

GPU-aware job orchestration

Confidence calibration and validation

Dataset health monitoring

By combining STRING-guided selection, MMseqs2-GPU acceleration, and NVIDIA H100-powered multimer inference, this work extends AFDB into a unified, interaction-aware structural resource.

This infrastructure enables:

Variant interpretation at interfaces

Systems-level structural biology

Drug target validation

Generative protein design benchmarking

Resources

Read more about the project here: https://research.nvidia.com/labs/dbr/assets/data/manuscripts/afdb.pdf

Accelerated libraries and SDKs are available here:

MMseqs2-GPU

NVIDIA cuEquivariance

NVIDIA TensorRT

If you wish to deploy MSA search and protein folding easily, you can get accelerated inference pipelines through NVIDIA’s Inference Microservices (NIMs):

MSA Search NIM

OpenFold2 NIM

The predictions from this effort are available through https://alphafold.com

この記事をシェア

NVIDIA Developer Blog2026年4月10日 02:00

Kubernetes上でSlurmを使用した大規模GPUワークロードの実行

NVIDIA Developer Blog2026年4月10日 01:48

約30行のPythonとNVIDIA nvCOMPでチェックポイントコストを削減

InfoQ重要度42026年4月14日 21:00

NVIDIA GPUに対する新たなRowhammer攻撃がシステム完全制御を可能に

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む