Generative AI and LLM
44 terms in the Generative AI and LLM domain — each bilingual TR/EN with related-term graph.
Most Read
All Terms (44)
Abstention
The ability of a model to avoid fabricating certainty and instead decline or express uncertainty when it is not confident.
Adapters
A parameter-efficient approach that inserts small modules into the base model to enable task adaptation.
Autoregressive Decoding
A generation mode in which the model produces output token by token using previous outputs as context.
Catastrophic Forgetting
The problem in which a model loses some of its prior general abilities while being adapted to new tasks.
Citation Grounding
An approach that improves trust by explicitly showing the source passages supporting the generated answer.
Constitutional AI
An alignment approach that tries to guide model behavior through explicit principle sets and normative rules.
Continuous Batching
A serving approach that increases throughput by dynamically merging requests arriving at different times into the same processing flow.
INT4 Quantization
An aggressive quantization approach that reduces the model to 4-bit precision for much lower memory cost.
INT8 Quantization
A common quantization form that reduces weights and sometimes activations to 8-bit precision for balanced efficiency and quality.
Instruction Model
A version of a general language model adapted to follow task instructions more effectively.
Mixture of Experts
An approach in which only relevant expert subnetworks are activated for each input to achieve scale and efficiency.
Model Checkpoint
A saved model state captured at a certain stage of training and reusable later.
Multimodal Transformer
A model design that processes different data types such as text, images, audio, or video within a shared attention architecture.
Paged Attention
An attention-management technique that handles KV cache memory more efficiently and improves resource use under multi-request serving.
Parameter Efficient Fine-Tuning
A fine-tuning approach that adapts a model using a limited number of parameters instead of updating the full model.
Post-Training Quantization
A quantization approach that reduces a pretrained model to lower-bit precision to gain memory and speed benefits.
Prefix Tuning
A PEFT technique that steers the model’s internal attention behavior through small learnable prefix representations.
Pretraining
The initial training stage in which a model learns broad patterns from large-scale general data.
Prompt Template
A parameterized prompt pattern that provides reusable and consistent structure across repeated tasks.
Sampling
The process of making probabilistic choices from a learned distribution while generating new output.
Scaling Laws
A set of empirical regularities describing how performance changes as model size, data, and compute increase.
Speculative Decoding
A decoding approach that speeds up generation by validating proposals from a smaller fast model with a larger model.
Stochastic Generation
A generation mode that introduces probabilistic diversity instead of producing the exact same output every time.
System Prompt
A high-level instruction layer that defines the model’s overall behavior, role, and priorities.
Temperature Sampling
A parameter that adjusts output distribution sharpness to produce more controlled or more creative generation.
Tensor Parallelism
A technique that scales inference and training by splitting large model computations across devices within layers.
Text-to-Image Generation
A generative modeling approach that synthesizes new images from natural language prompts.
Tokenizer
A core intermediary layer that converts text into tokens the model can process.
Tool-Augmented Generation
An approach in which the model uses tools such as computation, search, or external system calls to produce more accurate results.
Transferability
The ability of a model to transfer what it learned during pretraining into different tasks and domains.