Essential LLM Readings

Curated list of foundational and recent papers on Large Language Models

Essential LLM Research Papers

A curated reading list of the most important papers in Large Language Models, organized by priority and importance.


📌 Mandatory Readings (Must Read First)

These four papers form the absolute foundation of understanding LLMs:

1. Attention Is All You Need ⭐⭐⭐

  • Authors: Vaswani et al. (2017)
  • Link: arXiv
  • Why Essential: Introduced the Transformer architecture that underlies ALL modern LLMs. Without this paper, GPT, BERT, Claude, and other models wouldn’t exist.
  • Key Concepts: Self-attention mechanism, multi-head attention, positional encoding

2. Language Models are Few-Shot Learners (GPT-3) ⭐⭐⭐

  • Authors: Brown et al., OpenAI (2020)
  • Link: arXiv
  • Why Essential: Demonstrated that scaling + prompting = emergent capabilities. Triggered the LLM revolution and showed models can solve new tasks without fine-tuning.
  • Key Concepts: In-context learning, few-shot prompting, scaling laws

3. BERT: Pre-training of Deep Bidirectional Transformers ⭐⭐⭐

  • Authors: Devlin et al., Google (2018)
  • Link: arXiv
  • Why Essential: First massively pre-trained model that revolutionized NLP. Introduced transfer learning and bidirectional context understanding.
  • Key Concepts: Masked language modeling, transfer learning, bidirectional transformers

4. DeepSeek-R1: Incentivizing Reasoning Capability in LLMs 🔥

  • Authors: DeepSeek-AI (January 2025)
  • Link: arXiv
  • Why Revolutionary:
    • Achieves OpenAI o1 level reasoning performance while being fully open-source
    • Introduces novel reinforcement learning approach for reasoning
    • Outperforms GPT-4 on mathematical and coding benchmarks
  • Key Innovations: GRPO (Group Relative Policy Optimization), reasoning-focused RL training

Essential papers for understanding modern LLM capabilities:

Chain of Thought Prompting Elicits Reasoning ⭐⭐⭐

  • Authors: Wei et al., Google (2022)
  • Link: arXiv
  • Impact: Breakthrough in LLM reasoning, fundamental technique used everywhere today

Fine-Tuning Language Models from Human Preferences (RLHF) ⭐⭐⭐

  • Authors: Ouyang et al., OpenAI (2022)
  • Link: arXiv
  • Impact: Key to making LLMs useful and aligned, basis of ChatGPT and all conversational models

🚀 Recent Breakthroughs (2024-2025)

DeepSeek-V3 Technical Report

  • Authors: DeepSeek-AI (December 2024)
  • Link: arXiv
  • Impact: 671B parameter model trained with only $2.8M budget, achieving GPT-4 performance at fraction of cost

Scaling LLM Test-Time Compute Optimally

  • Authors: Various (August 2024)
  • Link: arXiv
  • Impact: New paradigm for improving LLM outputs during inference

Scaling and Architecture

  • Scaling Laws for Neural Language Models ⭐⭐ — Kaplan et al. (2020): arXiv
    • Shows predictable performance growth with scale
  • LLaMA: Open and Efficient Foundation Language Models ⭐⭐ — Touvron et al., Meta (2023): arXiv
    • Made powerful LLMs accessible to open-source community

Alignment and Safety

  • Constitutional AI: Harmlessness from AI Feedback ⭐⭐ — Bai et al., Anthropic (2022): arXiv
    • Alternative approach to RLHF for model alignment

Efficiency and Adaptation

  • LoRA: Low-Rank Adaptation of Large Language Models ⭐⭐ — Hu et al. (2021): arXiv
    • Efficient fine-tuning method widely adopted
  • FlashAttention: Fast and Memory-Efficient Exact Attention ⭐⭐ — Dao et al. (2022): arXiv
    • Key optimization for fast training/inference on GPU

📚 Additional Readings

Understanding and Analysis

  • The Illustrated Transformer ⭐ — Jay Alammar: Blog
    • Best visual guide to understanding Transformers
  • Emergent Abilities of Large Language Models ⭐ — Wei et al. (2022): arXiv
    • Analysis of emergent behaviors at scale

Scaling Examples

  • PaLM: Scaling Language Models with Pathways ⭐ — Chowdhery et al., Google (2022): arXiv
    • 540B parameter model demonstrating extreme scaling

Advanced Techniques

  • Self-Consistency Improves Chain of Thought Reasoning ⭐ — Wang et al. (2022): arXiv
    • Improves reasoning with multiple sampling
  • Toolformer: Language Models Can Teach Themselves to Use Tools ⭐ — Schick et al., Meta (2023): arXiv
    • LLMs learning to use APIs and external tools

📖 Reading Roadmap

  1. Start with the foundations: Read the 3 mandatory papers in order
  2. Understand modern capabilities: Read Chain of Thought and RLHF papers
  3. Explore recent breakthroughs: Study DeepSeek-V3 for understanding current SOTA
  4. Deep dive by interest:
    • For efficiency → LoRA, FlashAttention
    • For safety → Constitutional AI
    • For open models → LLaMA series
    • For scaling → Scaling Laws, PaLM