GPT from scratch / A Karpathy | Notion

The video
More resources
Other related projects

The Annotated Transformer

Notes

tokenization: “Hello” → [6, 32, 17, 17, 3]
embedding
batches: many examples at once
training loop
- estimate loss (averaging many batches) while training
self-attention: hide future tokens