Back to full roadmap
topicfoundation
How LLMs Work
Transformer architecture, attention, decoding — understand what the model is doing and why.
3-5 hours4 hours3 resources
LLMs are probabilistic machines that predict the next token. The self-attention mechanism in the Transformer architecture lets the model relate words in context. Decoding (greedy, sampling, beam search) decides how a token is picked from that distribution.
Why it matters: To know why your prompts work in some cases and fail in others, you need to understand the model's internal logic.
Why this matters
Prompt engineering isn't guessing — prompts written without understanding the model are fragile.
What you'll gain
When you write a prompt, you'll intuitively predict where the model's attention will focus and which tokens become more likely.