Glossary Library

Technical GlossaryGenerative AI and LLM

Tensor Parallelism

A technique that scales inference and training by splitting large model computations across devices within layers.

Tensor parallelism is one of the fundamental distributed computing techniques for running models that do not fit into the memory of a single device. Matrix operations inside layers are split across multiple GPUs. This not only makes model execution possible, but also shapes performance design in large-scale inference serving.

You Might Also Like

Explore these concepts to continue your artificial intelligence journey.

Glossary Cover

uretken-yapay-zeka-ve-llm

Generative Model

A family of models that can generate new samples rather than only predicting labels.

Glossary Cover

uretken-yapay-zeka-ve-llm

Sampling

The process of making probabilistic choices from a learned distribution while generating new output.

Glossary Cover

uretken-yapay-zeka-ve-llm

Temperature Sampling

A parameter that adjusts output distribution sharpness to produce more controlled or more creative generation.