Essential LLM Readings
Curated list of foundational and recent papers on Large Language Models
Essential LLM Research Papers
A curated reading list of the most important papers in Large Language Models, organized by priority and importance.
📌 Mandatory Readings (Must Read First)
These four papers form the absolute foundation of understanding LLMs:
1. Attention Is All You Need ⭐⭐⭐
- Authors: Vaswani et al. (2017)
- Link: arXiv
- Why Essential: Introduced the Transformer architecture that underlies ALL modern LLMs. Without this paper, GPT, BERT, Claude, and other models wouldn’t exist.
- Key Concepts: Self-attention mechanism, multi-head attention, positional encoding
2. Language Models are Few-Shot Learners (GPT-3) ⭐⭐⭐
- Authors: Brown et al., OpenAI (2020)
- Link: arXiv
- Why Essential: Demonstrated that scaling + prompting = emergent capabilities. Triggered the LLM revolution and showed models can solve new tasks without fine-tuning.
- Key Concepts: In-context learning, few-shot prompting, scaling laws
3. BERT: Pre-training of Deep Bidirectional Transformers ⭐⭐⭐
- Authors: Devlin et al., Google (2018)
- Link: arXiv
- Why Essential: First massively pre-trained model that revolutionized NLP. Introduced transfer learning and bidirectional context understanding.
- Key Concepts: Masked language modeling, transfer learning, bidirectional transformers
4. DeepSeek-R1: Incentivizing Reasoning Capability in LLMs 🔥
- Authors: DeepSeek-AI (January 2025)
- Link: arXiv
- Why Revolutionary:
- Achieves OpenAI o1 level reasoning performance while being fully open-source
- Introduces novel reinforcement learning approach for reasoning
- Outperforms GPT-4 on mathematical and coding benchmarks
- Key Innovations: GRPO (Group Relative Policy Optimization), reasoning-focused RL training
🎯 Core Papers (Highly Recommended)
Essential papers for understanding modern LLM capabilities:
Chain of Thought Prompting Elicits Reasoning ⭐⭐⭐
- Authors: Wei et al., Google (2022)
- Link: arXiv
- Impact: Breakthrough in LLM reasoning, fundamental technique used everywhere today
Fine-Tuning Language Models from Human Preferences (RLHF) ⭐⭐⭐
- Authors: Ouyang et al., OpenAI (2022)
- Link: arXiv
- Impact: Key to making LLMs useful and aligned, basis of ChatGPT and all conversational models
🚀 Recent Breakthroughs (2024-2025)
DeepSeek-V3 Technical Report
- Authors: DeepSeek-AI (December 2024)
- Link: arXiv
- Impact: 671B parameter model trained with only $2.8M budget, achieving GPT-4 performance at fraction of cost
Scaling LLM Test-Time Compute Optimally
- Authors: Various (August 2024)
- Link: arXiv
- Impact: New paradigm for improving LLM outputs during inference
📈 Important Papers (Recommended)
Scaling and Architecture
- Scaling Laws for Neural Language Models ⭐⭐ — Kaplan et al. (2020): arXiv
- Shows predictable performance growth with scale
- LLaMA: Open and Efficient Foundation Language Models ⭐⭐ — Touvron et al., Meta (2023): arXiv
- Made powerful LLMs accessible to open-source community
Alignment and Safety
- Constitutional AI: Harmlessness from AI Feedback ⭐⭐ — Bai et al., Anthropic (2022): arXiv
- Alternative approach to RLHF for model alignment
Efficiency and Adaptation
- LoRA: Low-Rank Adaptation of Large Language Models ⭐⭐ — Hu et al. (2021): arXiv
- Efficient fine-tuning method widely adopted
- FlashAttention: Fast and Memory-Efficient Exact Attention ⭐⭐ — Dao et al. (2022): arXiv
- Key optimization for fast training/inference on GPU
📚 Additional Readings
Understanding and Analysis
- The Illustrated Transformer ⭐ — Jay Alammar: Blog
- Best visual guide to understanding Transformers
- Emergent Abilities of Large Language Models ⭐ — Wei et al. (2022): arXiv
- Analysis of emergent behaviors at scale
Scaling Examples
- PaLM: Scaling Language Models with Pathways ⭐ — Chowdhery et al., Google (2022): arXiv
- 540B parameter model demonstrating extreme scaling
Advanced Techniques
- Self-Consistency Improves Chain of Thought Reasoning ⭐ — Wang et al. (2022): arXiv
- Improves reasoning with multiple sampling
- Toolformer: Language Models Can Teach Themselves to Use Tools ⭐ — Schick et al., Meta (2023): arXiv
- LLMs learning to use APIs and external tools
📖 Reading Roadmap
- Start with the foundations: Read the 3 mandatory papers in order
- Understand modern capabilities: Read Chain of Thought and RLHF papers
- Explore recent breakthroughs: Study DeepSeek-V3 for understanding current SOTA
- Deep dive by interest:
- For efficiency → LoRA, FlashAttention
- For safety → Constitutional AI
- For open models → LLaMA series
- For scaling → Scaling Laws, PaLM