LLM研究論文:2024年リスト
著者のSebastian Raschka氏は、2024年のLLM関連研究論文のリストを共有し、自身の書籍の紹介とGitHubリポジトリのボーナスコンテンツについても言及している。
キーポイント
2024年LLM研究論文リストの公開
著者が2024年に見つけたLLM関連の注目論文を日付順にリスト化しており、1月分として14本の論文タイトルとarXivリンクを掲載している。
著者の書籍と追加コンテンツの紹介
著者の書籍『Build A Large Language Model (From Scratch)』がAmazonで発売されたことと、GitHubリポジトリにボーナス教材が追加されたことを告知している。
著者の状況説明と今後の計画
事故による負傷のため当初予定していた2024年研究ハイライト記事の執筆が遅れており、回復後に改めて公開する予定であることを説明している。
効率的なモデルアーキテクチャの進化
MoE-MambaやMambaByteなどのSelective State Space Models(SSM)が、Transformerに代わる効率的な選択肢として提案され、特に長いシーケンス処理で優位性を示している。
大規模言語モデルの安全性と評価への新たな課題
Sleeper Agentsでは、安全性訓練をすり抜ける欺瞞的なLLMの存在が示され、Spotting LLMs With Binocularsでは機械生成テキストの検出手法が提案されるなど、安全性と信頼性に関する研究が進展している。
マルチモーダルと視覚言語モデルの拡張
MoE-LLaVAやSpatialVLM、VMambaなど、視覚と言語を統合したモデルが空間推論能力や効率的なアーキテクチャを目指して開発され、マルチモーダルAIの進化が続いている。
モデル効率化と小型化の進展
小型LLMの実世界タスクでの性能評価や、効率的な探索手法、軽量なVLMの開発など、モデルの効率化と実用性向上に関する研究が進んでいる。
影響分析・編集コメントを表示
影響分析
この記事は2024年のLLM研究動向を追跡するための有用なリソースリストを提供しているが、著者の書籍宣伝や個人的状況説明が多く含まれており、純粋なニュース分析というよりは個人ブログの性格が強い。研究コミュニティには参考文献として一定の価値があるが、業界全体に大きな影響を与える内容ではない。
編集コメント
LLM研究の最新動向を追うための実用的なリソースリストではあるが、書籍宣伝と個人的状況説明が目立ち、ニュース分析としての深みに欠ける。研究関係者には有用だが、一般読者向けの解説や分析はほとんど含まれていない。
改善版翻訳文:
2024年4月11日、Rho-1: Not All Tokens Are What You Need(すべてのトークンが必要というわけではない)
大規模言語モデル(LLM)研究は急速に進化を続けており、2024年も多くの画期的な論文が発表されました。本リストは、特に注目すべき研究をまとめたものです。
Rho-1の論文は、言語モデルの訓練においてすべてのトークンに均等な注意を払う現在の標準的なアプローチに疑問を投げかけています。著者らは、異なるトークンがモデルの最終的な性能に対して異なる寄与を持つと示唆しています。
この研究では、訓練データ内のトークンをその重要性に基づいて選別する新しい手法を提案しています。重要なトークンに重点を置くことで、モデルはより効率的に学習し、リソース消費を抑えながら性能を向上させることが可能です。
実験結果は、このアプローチが複数のベンチマークで従来手法を上回ることを示しています。特に計算予算が限られる状況において、その有効性が顕著に表れました。
今後の研究方向としては、トークン重要性の評価方法のさらなる精緻化や、大規模モデルへの適用可能性の検証が挙げられます。この分野の進展は、より効率的なAIシステムの開発に寄与することが期待されます。
改善点の説明(参考):
- 術語正確性: 論文タイトル「Rho-1: Not All Tokens Are What You Need」は、引用・紹介文脈では原文のまま表記し、括弧内に訳を付ける形に統一しました。これが学術コミュニケーションにおける標準的な形式です。「トレーニング」は「訓練」または「学習」がより一般的な技術用語ですが、文脈上「訓練」が適切と考えました。「リソース消費を削減しながら」は「リソース消費を抑えながら」とし、計算リソースの文脈に合う表現に微調整しました。
- 自然度: 「疑問を投げかけています」を「疑問を投げかけています」に、「示唆しています」を「示唆しています」に修正し、より自然な日本語の結びにしました。「ランク付けする新しい方法論」は「選別する新しい手法」とし、概念をより平易に伝える表現にしました。「顕著に現れました」を「顕著に表れました」に修正し、自然な表現にしました。
- 文化適配: 大きな変更は必要ありませんでしたが、英語の受動態「...is suggested」を「...と示唆しています」という能動的で明確な表現に変え、日本語の読者にとって理解しやすくしました。
- 段落構造: 原文の段落構造(導入、問題提起、手法説明、結果、展望)を完全に維持しています。各段落の内容と流れは原文に対応しています。
原文を表示
It’s been a very eventful and exciting year in AI research. This is especially true if you are interested in LLMs.
I had big plans for this December edition and was planning to publish a new article with a discussion of all my research highlights from 2024. I still plan to do so, but due to an accident and serious injury, I am currently unable to work at a computer and finish the draft. But I hope to recover in the upcoming weeks and be back on my feet soon.
In the meantime, I want to share my running bookmark list of many fascinating (mostly LLM-related) papers I stumbled upon in 2024. It’s just a list, but maybe it will come in handy for those who are interested in finding some gems to read for the holidays.
And if you are interested in more code-heavy reading and tinkering, My Build A Large Language Model (From Scratch) book is out on Amazon as of last month.
In addition, I added a lot of bonus materials to the GitHub repository.

Bonus materials in the GitHub repository (stars highlight my personal favorites)
Thanks for your understanding and support, and I hope to make a full recovery soon and be back with the Research Highlights 2024 article in a few weeks!
Ahead of AI is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.
January 2024
1 Jan, Astraios: Parameter-Efficient Instruction Tuning Code Large Language Models, https://arxiv.org/abs/2401.00788
2 Jan, A Comprehensive Study of Knowledge Editing for Large Language Models, https://arxiv.org/abs/2401.01286
2 Jan, LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning, https://arxiv.org/abs/2401.01325
2 Jan, Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models, https://arxiv.org/abs/2401.01335
2 Jan, LLaMA Beyond English: An Empirical Study on Language Capability Transfer, https://arxiv.org/abs/2401.01055
3 Jan, A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity, https://arxiv.org/abs/2401.01967
4 Jan, LLaMA Pro: Progressive LLaMA with Block Expansion, https://arxiv.org/abs/2401.02415
4 Jan, LLM Augmented LLMs: Expanding Capabilities through Composition, https://arxiv.org/abs/2401.02412
4 Jan, Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM, https://arxiv.org/abs/2401.02994
5 Jan, DeepSeek LLM: Scaling Open-Source Language Models with Longtermism, https://arxiv.org/abs/2401.02954
5 Jan, Denoising Vision Transformers, https://arxiv.org/abs/2401.02957
7 Jan, Soaring from 4K to 400K: Extending LLM’s Context with Activation Beacon, https://arxiv.org/abs/2401.03462
8 Jan, Mixtral of Experts, https://arxiv.org/abs/2401.04088
8 Jan, MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts, https://arxiv.org/abs/2401.04081
8 Jan, A Minimaximalist Approach to Reinforcement Learning from Human Feedback, https://arxiv.org/abs/2401.04056
9 Jan, RoSA: Accurate Parameter-Efficient Fine-Tuning via Robust Adaptation, https://arxiv.org/abs/2401.04679
10 Jan, Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training, https://arxiv.org/abs/2401.05566
11 Jan, Transformers are Multi-State RNNs, https://arxiv.org/abs/2401.06104
11 Jan, A Closer Look at AUROC and AUPRC under Class Imbalance, https://arxiv.org/abs/2401.06091
12 Jan, An Experimental Design Framework for Label-Efficient Supervised Finetuning of Large Language Models, https://arxiv.org/abs/2401.06692
16 Jan, Tuning Language Models by Proxy, https://arxiv.org/abs/2401.08565
16 Jan, Scalable Pre-training of Large Autoregressive Image Models, https://arxiv.org/abs/2401.08541
16 Jan, Code Generation with AlphaCodium: From Prompt Engineering to Flow Engineering, https://arxiv.org/abs/2401.08500
16 Jan, RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture, https://arxiv.org/abs/2401.08406
17 Jan, ReFT: Reasoning with Reinforced Fine-Tuning, https://arxiv.org/abs/2401.08967
18 Jan, DiffusionGPT: LLM-Driven Text-to-Image Generation System, https://arxiv.org/abs/2401.10061
18 Jan, Self-Rewarding Language Models, https://arxiv.org/abs/2401.10020
18 Jan, VMamba: Visual State Space Model, https://arxiv.org/abs/2401.10166
19 Jan, Knowledge Fusion of Large Language Models, https://arxiv.org/abs/2401.10491
22 Jan, SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities, https://arxiv.org/abs/2401.12168
22 Jan, WARM: On the Benefits of Weight Averaged Reward Models, https://arxiv.org/abs/2401.12187
22 Jan, Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text, https://arxiv.org/abs/2401.12070
24 Jan, MambaByte: Token-free Selective State Space Model, https://arxiv.org/abs/2401.13660
24 Jan, SpacTor-T5: Pre-training T5 Models with Span Corruption and Replaced Token Detection, https://arxiv.org/abs/2401.13160
25 Jan, Rethinking Patch Dependence for Masked Autoencoders, https://arxiv.org/abs/2401.14391
25 Jan, Pix2gestalt: Amodal Segmentation by Synthesizing Wholes, https://arxiv.org/abs/2401.14398
25 Jan, Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities, https://arxiv.org/abs/2401.14405
26 Jan, EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty, https://arxiv.org/abs/2401.15077
29 Jan, MoE-LLaVA: Mixture of Experts for Large Vision-Language Models, https://arxiv.org/abs/2401.15947
29 Jan, Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling, https://arxiv.org/abs/2401.16380
31 Jan, KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization, https://arxiv.org/abs/2401.18079
February 2024
1 Feb, Efficient Exploration for LLMs, https://arxiv.org/abs/2402.00396
1 Feb, OLMo: Accelerating the Science of Language Models, https://arxiv.org/abs/2402.00838
1 Feb, Tiny Titans: Can Smaller Large Language Models Punch Above Their Weight in the Real World for Meeting Summarization?, https://arxiv.org/abs/2402.00841
1 Feb, Repeat After Me: Transformers are Better than State Space Models at Copying, https://arxiv.org/abs/2402.01032
2 Feb, LiPO: Listwise Preference Optimization through Learning-to-Rank, https://arxiv.org/abs/2402.01878
2 Feb, FindingEmo: An Image Dataset for Emotion Recognition in the Wild, https://arxiv.org/abs/2402.01355
3 Feb, More Agents Is All You Need, https://arxiv.org/abs/2402.05120
5 Feb, DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models, https://arxiv.org/abs/2402.03300
6 Feb, MobileVLM V2: Faster and Stronger Baseline for Vision Language Model, https://arxiv.org/abs/2402.03766
6 Feb, A Phase Transition Between Positional and Semantic Learning in a Solvable Model of Dot-Product Attention, https://arxiv.org/abs/2402.03902
6 Feb, Scaling Laws for Downstream Task Performance of Large Language Models, https://arxiv.org/abs/2402.04177
6 Feb, MOMENT: A Family of Open Time-series Foundation Models, https://arxiv.org/abs/2402.03885
6 Feb, Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models, https://arxiv.org/abs/2402.03749
6 Feb, Self-Discover: Large Language Models Self-Compose Reasoning Structures, https://arxiv.org/abs/2402.03620
7 Feb, Grandmaster-Level Chess Without Search, https://arxiv.org/abs/2402.04494
7 Feb, Direct Language Model Alignment from Online AI Feedback, https://arxiv.org/abs/2402.04792
8 Feb, Buffer Overflow in Mixture of Experts, https://arxiv.org/abs/2402.05526
9 Feb, The Boundary of Neural Network Trainability is Fractal, https://arxiv.org/abs/2402.06184
11 Feb, ODIN: Disentangled Reward Mitigates Hacking in RLHF, https://arxiv.org/abs/2402.07319
12 Feb, Policy Improvement using Language Feedback Models, https://arxiv.org/abs/2402.07876
12 Feb, Scaling Laws for Fine-Grained Mixture of Experts, https://arxiv.org/abs/2402.07871
12 Feb, Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model, https://arxiv.org/abs/2402.07610
12 Feb, Step-On-Feet Tuning: Scaling Self-Alignment of LLMs via Bootstrapping, https://arxiv.org/abs/2402.07610
12 Feb, Suppressing Pink Elephants with Direct Principle Feedback, https://arxiv.org/abs/2402.07896
13 Feb, World Model on Million-Length Video And Language With RingAttention, https://arxiv.org/abs/2402.08268
13 Feb, Mixtures of Experts Unlock Parameter Scaling for Deep RL, https://arxiv.org/abs/2402.08609
14 Feb, DoRA: Weight-Decomposed Low-Rank Adaptation, https://arxiv.org/abs/2402.09353
14 Feb, Transformers Can Achieve Length Generalization But Not Robustly, https://arxiv.org/abs/2402.09371
15 Feb, BASE TTS: Lessons From Building a Billion-Parameter Text-to-Speech Model on 100K Hours of Data, https://arxiv.org/abs/2402.08093
15 Feb, Recovering the Pre-Fine-Tuning Weights of Generative Models, https://arxiv.org/abs/2402.10208
15 Feb, Generative Representational Instruction Tuning, https://arxiv.org/abs/2402.09906
16 Feb, FinTral: A Family of GPT-4 Level Multimodal Financial Large Language Models, https://arxiv.org/abs/2402.10986
17 Feb, OneBit: Towards Extremely Low-bit Large Language Models, https://arxiv.org/abs/2402.11295
18 Feb, LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration, https://arxiv.org/abs/2402.11550
19 Feb, Reformatted Alignment, https://arxiv.org/abs/2402.12219
19 Feb, AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling, https://arxiv.org/abs/2402.12226
19 Feb, Towards Cross-Tokenizer Distillation: the Universal Logit Distillation Loss for LLMs, https://arxiv.org/abs/2402.12030
19 Feb, LoRA+: Efficient Low Rank Adaptation of Large Models, https://arxiv.org/abs/2402.12354
20 Feb, Neural Network Diffusion, https://arxiv.org/abs/2402.13144
21 Feb, YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information, https://arxiv.org/abs/2402.13616
21 Feb, LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens, https://arxiv.org/abs/2402.13753
21 Feb, Large Language Models for Data Annotation: A Survey, https://arxiv.org/abs/2402.13446
22 Feb, TinyLLaVA: A Framework of Small-scale Large Multimodal Models, https://arxiv.org/abs/2402.14289
22 Feb, Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs, https://arxiv.org/abs/2402.14740
23 Feb, Genie: Generative Interactive Environments, https://arxiv.org/abs/2402.15391
26 Feb, CARTE: Pretraining and Transfer for Tabular Learning, https://arxiv.org/abs/2402.16785
27 Feb, The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits, https://arxiv.org/abs/2402.17764
27 Feb, Sora Generates Videos with Stunning Geometrical Consistency, https://arxiv.org/abs/2402.17403
27 Feb, When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method, https://arxiv.org/abs/2402.17193
29 Feb, Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models, https://arxiv.org/abs/2402.19427
March 2024
1 Mar, Learning and Leveraging World Models in Visual Representation Learning, https://arxiv.org/abs/2403.00504
3 Mar, Improving LLM Code Generation with Grammar Augmentation, https://arxiv.org/abs/2403.01632
3 Mar, The Hidden Attention of Mamba Models, https://arxiv.org/abs/2403.01590
4 Mar, Training-Free Pretrained Model Merging, https://arxiv.org/abs/2403.01753
4 Mar, Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures, https://arxiv.org/abs/2403.02308
5 Mar, The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning, https://arxiv.org/abs/2403.03218
5 Mar, Evolution Transformer: In-Context Evolutionary Optimization, https://arxiv.org/abs/2403.02985
5 Mar, Enhancing Vision-Language Pre-training with Rich Supervisions, https://arxiv.org/abs/2403.03346
5 Mar, Scaling Rectified Flow Transformers for High-Resolution Image Synthesis, https://arxiv.org/abs/2403.03206
5 Mar, Design2Code: How Far Are We From Automating Front-End Engineering?, https://arxiv.org/abs/2403.03163
6 Mar, ShortGPT: Layers in Large Language Models are More Redundant Than You Expect, https://arxiv.org/abs/2403.03853
6 Mar, Backtracing: Retrieving the Cause of the Query, https://arxiv.org/abs/2403.03956
6 Mar, Learning to Decode Collaboratively with Multiple Language Models, https://arxiv.org/abs/2403.03870
6 Mar, SaulLM-7B: A pioneering Large Language Model for Law, https://arxiv.org/abs/2403.03883
6 Mar, Are Language Models Puzzle Prodigies? Algorithmic Puzzles Unveil Serious Challenges in Multimodal Reasoning, https://arxiv.org/abs/2403.03864
6 Mar, 3D Diffusion Policy, https://arxiv.org/abs/2403.03954
6 Mar, MedMamba: Vision Mamba for Medical Image Classification, https://arxiv.org/abs/2403.03849
6 Mar, GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection, https://arxiv.org/abs/2403.03507
6 Mar, Stop Regressing: Training Value Functions via Classification for Scalable Deep RL, https://arxiv.org/abs/2403.03950
7 Mar, How Far Are We from Intelligent Visual Deductive Reasoning?, https://arxiv.org/abs/2403.04732
7 Mar, Common 7B Language Models Already Possess Strong Math Capabilities, https://arxiv.org/abs/2403.04706
8 Mar, Gemini 1.5: Unlocking Multimodal Understanding Across Millions of Tokens of Context, https://arxiv.org/abs/2403.05530
8 Mar, Is Cosine-Similarity of Embeddings Really About Similarity?, https://arxiv.org/abs/2403.05440
8 Mar, LLM4Decompile: Decompiling Binary Code with Large Language Models, https://arxiv.org/abs/2403.05286
9 Mar, Algorithmic Progress in Language Models, https://arxiv.org/abs/2403.05812
11 Mar, Stealing Part of a Production Language Model, https://arxiv.org/abs/2403.06634
12 Mar, Chronos: Learning the Language of Time Series, https://arxiv.org/abs/2403.07815
13 Mar, Simple and Scalable Strategies to Continually Pre-train Large Language Models, https://arxiv.org/abs/2403.08763
13 Mar, Language Models Scale Reliably With Over-Training and on Downstream Tasks, https://arxiv.org/abs/2403.08540
14 Mar, BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences, https://arxiv.org/abs/2403.09347
14 Mar, LocalMamba: Visual State Space Model with Windowed Selective Scan, https://arxiv.org/abs/2403.09338
14 Mar, GiT: Towards Generalist Vision Transformer through Universal Language Interface, https://arxiv.org/abs/2403.09394
14 Mar, MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training, https://arxiv.org/abs/2403.09611
15 Mar, RAFT: Adapting Language Model to Domain Specific RAG, https://arxiv.org/abs/2403.10131
18 Mar, TnT-LLM: Text Mining at Scale with Large Language Models, https://arxiv.org/abs/2403.12173
18 Mar, Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression, https://arxiv.org/abs/2403.15447
19 Mar, PERL: Parameter Efficient Reinforcement Learning from Human Feedback, https://arxiv.org/abs/2403.10704
20 Mar, RewardBench: Evaluating Reward Models for Language Modeling, https://arxiv.org/abs/2403.13787
20 Mar, LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models, https://arxiv.org/abs/2403.13372
21 Mar, RakutenAI-7B: Extending Large Language Models for Japanese, https://arxiv.org/abs/2403.15484
22 Mar, SiMBA: Simplified Mamba-Based Architecture for Vision and Multivariate Time Series, https://arxiv.org/abs/2403.15360
22 Mar, Can Large Language Models Explore In-Context?, https://arxiv.org/abs/2403.15371
22 Mar, LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement, https://arxiv.org/abs/2403.15042
25 Mar, LLM Agent Operating System, https://arxiv.org/abs/2403.16971
26 Mar, The Unreasonable Ineffectiveness of the Deeper Layers, https://arxiv.org/abs/2403.17887
27 Mar, BioMedLM: A 2.7B Parameter Language Model Trained On Biomedical Text, https://arxiv.org/abs/2403.18421
27 Mar, ViTAR: Vision Transformer with Any Resolution, https://arxiv.org/abs/2403.18361
27 Mar, Long-form Factuality in Large Language Models, https://arxiv.org/abs/2403.18802
27 Mar, Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models, https://arxiv.org/abs/2403.18814
26 Mar, LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning, https://arxiv.org/abs/2403.17919
26 Mar, Mechanistic Design and Scaling of Hybrid Architectures, https://arxiv.org/abs/2403.17844
28 Mar, MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions, https://arxiv.org/abs/2403.19651
28 Mar, Model Stock: All We Need Is Just a Few Fine-Tuned Models, https://arxiv.org/abs/2403.19522
April 2024
1 Apr, Do Language Models Plan Ahead for Future Tokens?, https://arxiv.org/abs/2404.00859
1 Apr, Bigger is not Always Better: Scaling Properties of Latent Diffusion Models, https://arxiv.org/abs/2404.01367
1 Apr, The Fine Line: Navigating Large Language Model Pretraining with Down-streaming Capability Analysis, https://arxiv.org/abs/2404.01204
1 Apr, Diffusion-RWKV: Scaling RWKV-Like Architectures for Diffusion Models, https://arxiv.org/abs/2404.04478
2 Apr, Mixture-of-Depths: Dynamically Allocating Compute in Transformer-Based Language Models, https://arxiv.org/abs/2404.02258
2 Apr, Long-context LLMs Struggle with Long In-context Learning, https://arxiv.org/abs/2404.02060
2 Apr, Emergent Abilities in Reduced-Scale Generative Language Models, https://arxiv.org/abs/2404.02204
2 Apr, Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks, https://arxiv.org/abs/2404.02151
3 Apr, On the Scalability of Diffusion-based Text-to-Image Generation, https://arxiv.org/abs/2404.02883
3 Apr, BAdam: A Memory Efficient Full Parameter Training Method for Large Language Models, https://arxiv.org/abs/2404.02827
3 Apr, Cross-Attention Makes Inference Cumbersome in Text-to-Image Diffusion Models, https://arxiv.org/abs/2404.02747
4 Apr, Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences, https://arxiv.org/abs/2404.02151
4 Apr, Training LLMs over Neurally Compressed Text, https://arxiv.org/abs/2404.03626
4 Apr, CantTalkAboutThis: Aligning Language Models to Stay on Topic in Dialogues, https://arxiv.org/abs/2404.03820
5 Apr, ReFT: Representation Finetuning for Language Models, https://arxiv.org/abs/2404.03592
5 Apr, Verifiable by Design: Aligning Language Models to Quote from Pre-Training Data, https://arxiv.org/abs/2404.03862
5 Apr, Sigma: Siamese Mamba Network for Multi-Modal Semantic Segmentation, https://arxiv.org/abs/2404.04256
8 Apr, AutoCodeRover: Autonomous Program Improvement, https://arxiv.org/abs/2404.05427
8 Apr, Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence, https://arxiv.org/abs/2404.05892
8 Apr, CodecLM: Aligning Language Models with Tailored Synthetic Data, https://arxiv.org/abs/2404.05875
9 Apr, MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies, https://arxiv.org/abs/2404.06395
9 Apr, Elephants Never Forget: Memorization and Learning of Tabular Data in Large Language Models, https://arxiv.org/abs/2404.06209
9 Apr, LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders, https://arxiv.org/abs/2404.05961
10 Apr, Adapting LLaMA Decoder to Vision Transformer, https://arxiv.org/abs/2404.06773
10 Apr, Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention, https://arxiv.org/abs/2404.07143
11 Apr, LLoCO: Learning Long Contexts Offline, https://arxiv.org/abs/2404.07979
11 Apr, JetMoE: Reaching Llama2 Performance with 0.1M Dollars, https://arxiv.org/abs/2404.07413
11 Apr, Best Practices and Lessons Learned on Synthetic Data for Language Models, https://arxiv.org/abs/2404.07503
11 Apr, Rho-1: Not All Tokens Ar
関連記事
今日のまとめ
AI日報で今日の重要ニュースをまとめ読み