1 | 01/14 | Introduction | - | |
| 01/16 | Attention & Transformers | Attention Is All You Need | |
2 | 01/21 | Pretraining | Language Models are Few-Shot Learners | Language Models are Unsupervised Multitask Learners,
Generating Long Sequences with Sparse Transformers,
An Empirical Model of Large-Batch Training |
| 01/23 | Scaling Laws | Training Compute-Optimal Large Language Models | Scaling laws for neural language models, Quasi-Newton Matrices with Limited Storage |
3 | 01/28 | Instruction Tuning | Scaling Instruction-Finetuned Language Models | Chain-of-Thought Prompting Elicits Reasoning in Large Language Models |
| 01/30 | The Flan Collection: Designing Data and Methods for Effective Instruction Tuning | Scaling Instruction-Finetuned Language Models, Finetuned Language Models Are Zero-Shot Learners |
4 | 02/04 | Prompting | Chain-of-Thought Prompting Elicits Reasoning in Large Language Models | Tree of Thoughts: Deliberate Problem Solving with Large Language Models
|
| 02/06 | Self-Consistency Improves Chain-of-Thought Reasoning in Language Models | Universal Self-Consistency for Large Language Model Generation, Early-Stopping Self-Consistency for Multi-step Reasoning, Ask One More Time: Self-Agreement Improves Reasoning of Language Models |
5 | 02/11 | ART: Automatic Multi-Step Reasoning and Tool-Use for Large Language Models | TALM: Tool Augmented Language Models, LLM Reasoners: New Evaluation, Library, and Analysis of Step-by-Step Reasoning with Large Language Models |
| 02/13 | Rethinking the Role of Demonstrations: What Makes In-Context Learning Work? | Ground-Truth Labels Matter: A Deeper Look into Input-Label Demonstrations, Active Learning Principles for In-Context Learning
with Large Language Models, Larger language models do in-context learning differently, Lost in the Middle: How Language Models Use Long Contexts |
6 | 02/18 | LLM Abilities | Emergent Abilities of Large Language Models | A Latent Space Theory for Emergent Abilities in Large Language Models, Are Emergent Abilities in Large Language Models just In-Context Learning? |
| 02/20 | Are Emergent Abilities of Large Language Models a Mirage? | Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models, Predicting Emergent Capabilities by Finetuning, Training on the Test Task Confounds Evaluation and Emergence, State of What Art? A Call for Multi-Prompt LLM Evaluation |
7 | 02/25 | Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers | ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models |
| 02/27 | Phenomenal Yet Puzzling: Testing Inductive Reasoning Capabilities of Language Models with Hypothesis Refinement | Large Language Models can Learn Rules, Context-Aware Testing: A New Paradigm for Model Testing with Large Language Models |
8 | 03/04 | Alignment and Agent | Training Language Models to Follow Instructions with Human Feedback | Aligning Language Models with Self-Generated Instruction, Direct Preference Optimization: Your Language Model is Secretly a Reward Model, LIMA: Less Is More for Alignment |
| 03/06 | Toolformer: Language Models Can Teach Themselves to Use Tools | |
9 | 03/11 | MoE | Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity | |
| 03/13 | GLaM: Efficient Scaling of Language Models with Mixture-of-Experts | |
10 | 03/18 | Spring Break | - | |
| 03/20 | - | |
11 | 03/25 | RAG | Improving Language Models by Retrieving from Trillions of Tokens | |
| 03/27 | RL | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | |
12 | 04/01 | Multimodal | Learning Transferable Visual Models From Natural Language Supervision (CLIP) | |
| 04/03 | Improved Baselines with Visual Instruction Tuning (LLaVA) | |
13 | 04/08 | Distillation and Quantization | TinyBERT: Distilling BERT for Natural Language Understanding | |
| 04/10 | LoRA: Low-Rank Adaptation of Large Language Models | |
14 | 04/15 | No Class | - | |
| 04/17 | Long Context | A Controlled Study on Long Context Extension and Generalization in LLMs | |
15 | 04/22 | Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention | |
| 04/24 | Fact Checking | SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models | |