Skip to content
Back to full roadmap
topicfoundation

How LLMs Work

Transformer architecture, attention, decoding — understand what the model is doing and why.

3-5 hours4 hours3 resources

LLMs are probabilistic machines that predict the next token. The self-attention mechanism in the Transformer architecture lets the model relate words in context. Decoding (greedy, sampling, beam search) decides how a token is picked from that distribution.

Why it matters: To know why your prompts work in some cases and fail in others, you need to understand the model's internal logic.

Why this matters

Prompt engineering isn't guessing — prompts written without understanding the model are fragile.

What you'll gain

When you write a prompt, you'll intuitively predict where the model's attention will focus and which tokens become more likely.

Resources(3)