- The video
- More resources
- Other related projects
The Annotated Transformer
Notes
- tokenization: “Hello” → [6, 32, 17, 17, 3]
- embedding
- batches: many examples at once
- training loop
- estimate loss (averaging many batches) while training
- self-attention: hide future tokens