- The video
- More resources
- Other related projects
The Annotated Transformer
https://youtu.be/kwSVtQ7dziU?t=3691 - AK in 2026 on “agentic education”
Notes
- tokenization: “Hello” → [6, 32, 17, 17, 3]
- embedding
- batches: many examples at once
- training loop
- estimate loss (averaging many batches) while training
- self-attention: hide future tokens