Skip to content
Category

Deep Learning

119 terms in the Deep Learning domain — each bilingual TR/EN with related-term graph.

Neural NetworksActivation FunctionsBackpropagationRegularization TechniquesConvolutional Neural NetworksRNN / LSTM / GRUAttention MechanismsTransformer ArchitecturesAutoencoder ArchitecturesGraph Neural Networks

Most Read

All Terms (119)

G
12 terms
🌊

GELU Activation

A modern activation function that transforms inputs with probabilistic smoothness rather than a hard threshold.

⚙️

GRU

A recurrent unit that learns sequence dependencies through a simpler gating structure than LSTM.

🚪

Gated Linear Unit

An activation-like structure that filters linear signals through a gating mechanism to enable more selective information flow.

Gradient Checking

A debugging technique that validates analytical gradients by comparing them with numerical approximations.

🌊

Gradient Flow

A core training-dynamics concept describing how effectively the learning signal moves across network layers.

📉

Gradient Noise Scale

A training-dynamics measure that characterizes how noisy gradient estimates are in stochastic optimization.

🎯

Graph Attention Network

A GNN architecture that combines neighboring nodes with learned attention weights rather than treating them equally.

🕸️

Graph Classification

A graph learning task focused on assigning a single label to the entire graph.

🕸️

Graph Convolutional Network

A foundational GNN architecture that learns representations over graphs by using neighborhood information.

🧬

Graph Isomorphism Network

A GNN architecture designed to strengthen the theoretical power of distinguishing graph structures.

📦

Graph Pooling

A GNN operation that aims to compress node information into more compact and task-relevant representations.

🌐

GraphSAGE

A GNN method that makes representation learning scalable on large graphs through neighborhood sampling.

S
16 terms
📊

SELU Activation

An activation function designed to support self-normalizing network behavior.

🎯

Scaled Dot-Product Attention

The fundamental Transformer operation that computes attention weights by scaling similarity between query and key vectors.

🗓️

Scheduled Sampling

A method that gradually reduces teacher forcing to bring training conditions closer to inference conditions.

🪞

Self-Attention

An attention mechanism in which each element in a sequence directly models its relationship with all others.

🔀

Sequence-to-Sequence Learning

A general modeling approach focused on converting one input sequence into another output sequence.

🪶

Sharpness-Aware Minimization

An optimization approach that seeks not only low loss but also flatter and more generalizable solution regions.

📉

Sigmoid Activation

A classical activation function that squashes input values into the range between 0 and 1.

⤴️

Skip Connection

An architectural connection that allows information to bypass certain layers and improves training stability.

🎛️

Softmax Activation

An output activation that expresses multiclass outputs as a normalized probability distribution.

🧾

Sparse Attention

An attention approach that reduces cost by allowing each element to attend only to selected regions rather than the full sequence.

🧬

Sparse Autoencoder

A type of autoencoder that encourages only a small number of latent neurons to activate, leading to more selective features.

📡

Squeeze-and-Excitation

A CNN module that reweights feature channels using global context.

🎲

Stochastic Depth

A method that provides stronger regularization in very deep networks by randomly skipping some layers during training.

🧮

Stochastic Weight Averaging

A method that averages parameter states from different stages of training in order to obtain more robust generalization.

👣

Stride

A CNN hyperparameter that determines how many steps a filter moves across the input and affects output resolution.

🌊

Swish Activation

A modern activation function that multiplies the input by a sigmoid to create a smooth nonlinear transformation.