| 1 |
1. Foundation |
Python & Tensors Crash Course |
Install PyTorch. Create two 3x3 matrices, multiply them, and fix a "Shape Mismatch" error. |
PyTorch Fundamentals (Ch 0) |
Link |
|
| 2 |
1. Foundation |
The Derivative (Autograd) |
Watch Karpathy Video 1. Implement `Value` class in Python that stores data and its gradient. |
Karpathy: Micrograd |
Link |
|
| 3 |
1. Foundation |
Backpropagation Engine |
Finish `micrograd`. Train a tiny network to classify 3 points of data (0 or 1). Plot decision boundary. |
Karpathy Video 1 (Continued) |
Link |
|
| 4 |
1. Foundation |
Language Modeling (Bigram) |
Watch Karpathy Video 2. Build a character-level Bigram model. Generate random names. |
Karpathy: Makemore |
Link |
|
| 5 |
1. Foundation |
MLP & Internals |
Upgrade Bigram model to a Multi-Layer Perceptron (MLP). Implement `train_step` manually. |
Karpathy Video 2 (Continued) |
Link |
|
| 6 |
1. Foundation |
Batch Normalization |
Implement BatchNorm from scratch. Understand `running_mean` vs `batch_mean`. |
Karpathy: Makemore Part 3 |
Link |
|
| 7 |
1. Foundation |
RNNs / GRUs (Manual) |
Implement a vanilla RNN cell class in PyTorch. Understand hidden states. |
CS336 Week 2 |
Link |
|
| 8 |
1. Foundation |
Attention Mechanism (Theory) |
Read "Attention Is All You Need" (Sections 1-3). Draw the Q, K, V matrix flow on paper. |
Paper: Attention Is All You Need |
Link |
|
| 9 |
1. Foundation |
Attention Implementation |
Implement `SelfAttention` class in PyTorch. Create the causal mask. |
NanoGPT Video (Karpathy) |
Link |
|
| 10 |
1. Foundation |
The Transformer Block |
Combine Attention + MLP + LayerNorm into a `Block`. Stack 4 of them. |
NanoGPT Video |
Link |
|
| 11 |
1. Foundation |
Training GPT |
Train NanoGPT on "TinyShakespeare" dataset. Generate Shakespeare-like text. |
NanoGPT Video |
Link |
|
| 12 |
1. Foundation |
Buffer / Review |
Review code. Add comments explaining every matrix shape transformation. |
Review Week |
|
|
| 13 |
2. DeepMind Stack |
JAX Primitives |
Install JAX. Re-do Week 1 (Matrix Mult) in JAX. Use `jax.jit` and time the speedup. |
JAX 101 |
Link |
|
| 14 |
2. DeepMind Stack |
Functional Gradients |
Use `jax.grad` to take derivative of tanh. Compare to Week 2 manual math. |
JAX - The Sharp Bits |
Link |
|
| 15 |
2. DeepMind Stack |
Vectorization (vmap) |
Write a function for one sample. Use `vmap` to run it on a batch. |
JAX Auto-Vectorization |
Link |
|
| 16 |
2. DeepMind Stack |
Flax Ecosystem |
Read Flax docs. Build a simple MLP using `flax.linen`. |
Flax Basics |
Link |
|
| 17 |
2. DeepMind Stack |
State Management |
Learn how Flax handles model parameters (dictionaries). Don't mutate them. |
Flax Patterns |
Link |
|
| 18 |
2. DeepMind Stack |
The Big Port (Part 1) |
Start rewriting Week 11 GPT in JAX/Flax. Implement Attention in Flax. |
Flax Examples |
Link |
|
| 19 |
2. DeepMind Stack |
The Big Port (Part 2) |
Finish JAX GPT. Create training loop using `optax`. |
Optax Documentation |
Link |
|
| 20 |
2. DeepMind Stack |
Internal Google Tools |
Search Code Search (CS) for "Haiku"/"Flax" usage in DeepMind directories. |
Internal Code Search |
|
|
| 21 |
2. DeepMind Stack |
TPU Theory |
Read internal docs on TPUs and XLA (Accelerated Linear Algebra). |
Internal Wiki / GSPMD Paper |
|
|
| 22 |
2. DeepMind Stack |
Parallelism (Data) |
Use `jax.pmap` to replicate model across 8 devices (simulate in Colab). |
JAX Distributed |
Link |
|
| 23 |
2. DeepMind Stack |
Model Sharding |
Read about `shard_map` or GSPMD. |
GSPMD Paper |
Link |
|
| 24 |
2. DeepMind Stack |
JAX Profiling |
Use JAX Profiler to visualize training loop. Find a bottleneck. |
JAX Profiling Docs |
Link |
|
| 25 |
2. DeepMind Stack |
Portfolio Polish |
Push JAX-GPT to GitHub. Write README explaining `pmap` usage. |
Milestone: Portfolio Piece 1 |
|
|
| 26 |
2. DeepMind Stack |
Buffer / Review |
Rest week. |
Review |
|
|
| 27 |
3. Agents/Specialization |
RL Intro (Policy Grad) |
Implement REINFORCE to solve CartPole in JAX. |
Spinning Up (OpenAI) |
Link |
|
| 28 |
3. Agents/Specialization |
PPO (The Standard) |
Read PPO paper. Look at `PureJaxRL` implementation. Run it. |
PureJaxRL |
Link |
|
| 29 |
3. Agents/Specialization |
LLM Agents (ReAct) |
Write script where LLM calls python function `calculator(a, b)` to solve math. |
Berkeley LLM Agents |
Link |
|
| 30 |
3. Agents/Specialization |
Tool Use Loops |
Implement "Chain of Thought" loop (Thought -> Action -> Observation). |
Paper: ReAct |
Link |
|
| 31 |
3. Agents/Specialization |
Scaling Laws |
Read "Chinchilla" paper. Calculate optimal tokens for 1B param model. |
Paper: Training Compute-Optimal LLMs |
Link |
|
| 32 |
3. Agents/Specialization |
Tokenization Deep Dive |
Build BPE tokenizer from scratch (Python). Understand byte-fallback. |
Andrej Karpathy: Tokenizer |
Link |
|
| 33 |
3. Agents/Specialization |
C++ for ML (Part 1) |
Read internal docs on "XLA Custom Calls". |
Internal Wiki |
|
|
| 34 |
3. Agents/Specialization |
C++ for ML (Part 2) |
Write a C++ function (e.g., fast GELU approx). |
JAX Custom Call Docs |
Link |
|
| 35 |
3. Agents/Specialization |
Connecting C++ to JAX |
Bind C++ function to Python using `pybind11` or JAX FFI. Call from JAX. |
Milestone: Technical Differentiator |
|
|
| 36 |
3. Agents/Specialization |
Inference Optimization |
Learn KV-Caching. Implement in JAX GPT to speed up generation. |
CS336 Inference Lecture |
Link |
|
| 37 |
3. Agents/Specialization |
Quantization |
Read about Int8 quantization. Apply naively to model weights. |
QLoRA Paper |
Link |
|
| 38 |
3. Agents/Specialization |
Portfolio Project 2 |
Build "Small Agent": LLM playing text game or using tools, in JAX. |
Milestone: Portfolio Piece 2 |
|
|
| 39 |
3. Agents/Specialization |
Buffer |
Catch up on missed weeks. |
Review |
|
|
| 40 |
4. Internal Pivot |
Internal Hunting |
Search Moma for "20% Projects" (Research Engineering, DeepMind). |
Internal Tools |
|
|
| 41 |
4. Internal Pivot |
Cold Emailing |
Draft emails to hosts highlighting JAX GPT and C++ Ops. |
Gmail |
|
|
| 42 |
4. Internal Pivot |
The "Debug" Interview |
Deliberately break code (learning rate, zero_grad) and fix it. |
Self-Practice |
|
|
| 43 |
4. Internal Pivot |
LeetCode (Python) |
Do 5 Medium Array/Graph problems. Focus on clean code. |
LeetCode |
Link |
|
| 44 |
4. Internal Pivot |
LeetCode (Math) |
Practice "Reservoir Sampling" or "Random Pick with Weight". |
LeetCode |
Link |
|
| 45 |
4. Internal Pivot |
ML System Design |
Design "Training Pipeline for 1T Parameters". |
CS329s System Design |
Link |
|
| 46 |
4. Internal Pivot |
20% Project Work |
Work on 20% project. Over-deliver on first task. |
Work |
|
|
| 47 |
4. Internal Pivot |
Mock Interview |
Practice technical Q&A (e.g., LayerNorm during backprop). |
Mock |
|
|
| 48 |
4. Internal Pivot |
Resume/Packet |
Update internal resume with "Implemented Distributed JAX Transformer". |
Resume |
|
|
| 49 |
4. Internal Pivot |
Apply |
Apply to internal transfer roles. |
Internal Job Board |
|
|
| 50 |
4. Internal Pivot |
Interview |
Interview rounds. |
N/A |
|
|
| 51 |
4. Internal Pivot |
Interview |
Interview rounds. |
N/A |
|
|
| 52 |
4. Internal Pivot |
Celebration |
Transition to Research Engineer role. |
Rest |
|
|