Mathematics, Statistics and Optimization

109 terms in the Mathematics, Statistics and Optimization domain — each bilingual TR/EN with related-term graph.

Linear AlgebraProbability TheoryStatistical ConceptsDistributionsHypothesis TestingOptimization MethodsDerivatives and GradientsLoss FunctionsInformation TheoryStatistical Model Comparison

All Terms (109)

4 terms

⚖️

AIC

An information criterion that supports model selection by balancing model fit against model complexity.

📊

ANOVA

A method used to test whether there are meaningful differences among the means of multiple groups.

⚡

Adam Optimization

A popular optimization algorithm that combines adaptive learning rates with momentum-like behavior.

🅰️

Alternative Hypothesis

The hypothesis that argues there is a meaningful effect, difference, or relationship in the data, in contrast to the null hypothesis.

10 terms

⚙️

BFGS

A quasi-Newton optimization method that approximates second-order information to achieve efficient convergence.

📚

BIC

A model selection criterion that applies a stronger penalty for complexity while evaluating model fit.

🧱

Basis

A set of linearly independent vectors capable of generating every element of a vector space.

🧮

Bayes' Theorem

A fundamental probability theorem that allows updating the probability of a hypothesis as new observations arrive.

0️⃣

Bernoulli Distribution

A discrete distribution that models single-step random experiments with only two possible outcomes.

🟦

Beta Distribution

A continuous distribution especially well suited for modeling proportions and probabilities between 0 and 1.

🎯

Bias

A statistical concept describing the tendency of an estimator to systematically deviate from the true value.

🔢

Binomial Distribution

A discrete distribution that models the number of successes in a fixed number of independent Bernoulli trials.

♻️

Bootstrap

A method that repeatedly resamples the data to estimate uncertainty, confidence intervals, and performance distributions.

🎯

Brier Score

A score that jointly evaluates the accuracy and calibration quality of probabilistic predictions.

11 terms

🧪

Calibration

A property describing how well a model’s predicted probabilities align with actual observed frequencies.

📶

Channel Capacity

The theoretical maximum amount of information that a communication channel can transmit without error.

🧮

Chi-Square Distribution

A distribution derived from the sum of squared standard normal variables and widely used in statistical testing.

⚠️

Condition Number

An important linear algebra indicator that measures how numerically sensitive or unstable a matrix is.

🔗

Conditional Probability

A concept that measures how likely an event is given that another event has already occurred.

📈

Consistency

The property of an estimator converging to the true value as the sample size grows.

⛓️

Constrained Optimization

An optimization approach in which the solution must satisfy not only the objective function but also specified constraints.

🛤️

Convex Optimization

A class of optimization problems where the objective and constraints satisfy favorable geometric conditions that enable more reliable solutions.

🔗

Correlation

A standardized measure of the direction and strength of the linear relationship between two variables.

🔄

Covariance

A fundamental statistical measure that shows how two variables change together.

🎯

Cross-Entropy Loss

A core classification loss that measures the mismatch between the true distribution and the model’s predicted probability distribution.

5 terms

📈

Derivative

A fundamental mathematical concept that measures the rate of change and slope of a function at a given point.

🔷

Determinant

A core linear algebra quantity that summarizes a matrix’s volume scaling effect and whether it is invertible.

↗️

Directional Derivative

A derivative concept that measures how fast a function changes in a specific direction.

🎨

Dirichlet Distribution

A multidimensional probability distribution used to model the probabilities of multiple categories jointly.

✖️

Dot Product

A core linear algebra operation that measures alignment and magnitude interaction between two vectors.

7 terms

🚚

Earth Mover’s Distance

A distance concept that measures how much mass must be moved to transform one distribution into another.

📐

Effect Size

A measure that captures not just whether an effect is significant, but how large it actually is.

🧭

Eigenvalue and Eigenvector

Vectors that preserve their direction under a linear transformation, along with the associated scaling factors.

🌫️

Entropy

A fundamental information-theoretic concept that measures uncertainty, disorder, or information content in a probability distribution.

🌪️

Estimator Variance

A measure of how much a method’s results vary across different samples.

📌

Expected Value

A measure indicating the long-run average value a random variable is expected to approach.

⏳

Exponential Distribution

A continuous distribution used to model the waiting time until an event occurs.

2 terms

🔬

Fisher Information

A statistical quantity that measures how much precise information the observed data carries about a model parameter.

🎯

Focal Loss

A classification loss that reduces the impact of easy examples and focuses more on hard or rare ones.

4 terms

🟨

Gamma Distribution

A flexible distribution used to model positive continuous quantities and waiting times for multiple events.

🧭

Gradient

A vector containing the partial derivatives of a multivariable function, indicating direction and magnitude of change.

✂️

Gradient Clipping

A technique that limits gradient magnitude to prevent excessively large gradients from destabilizing training.

⛰️

Gradient Descent

A fundamental optimization method that updates parameters in the opposite direction of the gradient to minimize a loss function.

3 terms

🏔️

Hessian Matrix

A matrix of second-order derivatives that helps describe the curvature of a function.

🪓

Hinge Loss

A loss function used especially in support vector machines that aims to create a safe margin between classes.

⚖️

Huber Loss

A hybrid loss function that balances MSE and MAE behavior and is more robust to outliers.

3 terms

🪢

Independence

A concept describing the case where the occurrence of one event or variable does not affect the probability of another.

🌱

Information Gain

An information-theoretic concept that measures how much uncertainty a feature reduces, especially in decision trees.

↩️

Inverse Matrix

A matrix that reverses a linear transformation and is defined only for full-rank matrices.

2 terms

🧾

Jacobian

A matrix structure that carries derivative information for vector-valued functions.

🔀

Jensen-Shannon Divergence

An information-theoretic divergence measure that compares two distributions in a more symmetric and stable way.

3 terms

🔁

K-Fold Cross Validation

A method that repeatedly evaluates a model across different data folds to provide a more reliable estimate of performance.

📡

KL Divergence

A directional divergence measure that quantifies how one probability distribution differs from another.

📊

Kurtosis

A measure that summarizes tail heaviness and the tendency of a distribution to produce outliers.

7 terms

🏷️

Label Smoothing

A loss-related improvement technique that softens target labels to reduce overconfidence in the model.

🪝

Lagrange Multipliers

A method used in constrained optimization to balance the objective function with the imposed constraints.

🧩

Law of Total Probability

A fundamental rule for computing the total probability of an event by combining contributions from distinct cases.

🔍

Likelihood

A statistical concept expressing how probable the observed data is under a given model parameter setting.

📍

Line Search

An optimization step-size selection approach that determines how far to move along a chosen direction.

🪵

Log Loss

A loss function that measures the quality of probabilistic classification predictions and strongly penalizes wrong confidence.

📈

Log-Normal Distribution

A right-skewed continuous distribution used for positive variables whose logarithm is normally distributed.

14 terms

⚖️

Mann-Whitney U Test

A test used to compare two independent groups without relying on strong parametric assumptions.

🔄

Markov Property

A property stating that a system’s future depends only on its current state, not on the full past history.

🔲

Matrix

A structure of numbers arranged in rows and columns, central to data representation and transformations.

🧠

Maximum A Posteriori Estimation (MAP)

A Bayesian estimation approach that accounts for prior knowledge while selecting parameters that explain the data.

🔍

Maximum Likelihood Estimation (MLE)

A fundamental statistical estimation method based on selecting the parameters that make the observed data most likely.

🧪

McNemar Test

A test used to compare the error behavior of two classifiers on the same set of examples.

📏

Mean Absolute Error (MAE)

A regression loss function that averages the absolute differences between predictions and true values, offering greater robustness.

📉

Mean Squared Error (MSE)

A common regression loss function that averages the squared differences between predictions and true values.

📍

Mean, Median, and Mode

Fundamental statistical measures that summarize the central tendency of a dataset from different perspectives.

📦

Mini-Batch Gradient Descent

A widely used optimization approach that splits training data into small batches to balance efficiency and stability.

🗜️

Minimum Description Length (MDL)

An information-theoretic principle stating that a good model is one that describes the data in the shortest sufficient way.

🌪️

Momentum

A method that speeds up optimization by incorporating the direction of past gradient updates.

🧪

Multiple Comparison Correction

A correction approach used to control false positives when multiple hypotheses are tested.

🔁

Mutual Information

A concept that measures how much knowing one variable reduces uncertainty about another.

3 terms

🌀

Newton's Method

An advanced optimization method that uses both slope and curvature information to aim for faster convergence.

🔔

Normal Distribution

One of the most widely used continuous distributions in statistics, known for its bell-shaped curve.

0️⃣

Null Hypothesis

The foundational starting point in hypothesis testing that assumes an observed effect is due to chance.

2 terms

📐

Orthogonality

A concept describing two vectors being perpendicular and carrying no linear interaction.

📏

Orthonormal Basis

A basis made of vectors that are mutually orthogonal and unit length, simplifying computation and interpretation.

10 terms

➗

Partial Derivative

A derivative that measures the change of a multivariable function with respect to only one variable.

🔀

Permutation Test

A resampling-based test that evaluates whether an observed difference could be due to chance with weaker reliance on distributional assumptions.

❓

Perplexity

A measure, especially common in language models, that summarizes how surprised a probability model is by the data.

📡

Poisson Distribution

A discrete distribution used to model the number of events occurring within a fixed interval of time or space.

👥

Population and Sample

The core statistical distinction between the full target group and the subset selected from it for analysis.

🧭

Posterior Probability

A Bayesian probability concept representing updated belief after observing new data.

🎯

Precision-Recall AUC

An evaluation metric that summarizes how well a model retrieves useful positives, especially in imbalanced settings.

🎲

Probability

A fundamental mathematical concept that quantifies the likelihood of an event occurring.

🪜

Proximal Gradient

An optimization method used for problems that combine smooth and non-smooth objective components.

📉

p-Value

The probability of observing a result at least as extreme as the one obtained, assuming the null hypothesis is true.

3 terms

📈

ROC-AUC

A widely used comparison metric that summarizes a classifier’s ability to separate positives and negatives across thresholds.

🎯

Random Variable

A mathematical variable that maps the possible outcomes of a random experiment to numerical values.

📚

Rank

A structural measure that expresses the number of linearly independent rows or columns in a matrix.

7 terms

🐎

Saddle Point

A type of point that behaves like a minimum in some directions and a maximum in others, often complicating optimization.

💡

Self-Information

An information-theoretic concept that measures how much information is gained when a single event occurs.

🪓

Singular Value Decomposition (SVD)

A powerful decomposition method that breaks a matrix into more fundamental components for structure analysis, compression, and dimensionality reduction.

📉

Skewness

A measure showing whether a distribution is symmetric and in which direction its tail extends.

🔋

Statistical Power

The probability that a statistical test will detect an effect when that effect truly exists.

🏃

Stochastic Gradient Descent

An optimization approach that updates parameters using single examples or small subsets instead of the full dataset at each step.

🧾

Sufficiency

The property of a statistic containing all relevant information in the data about a parameter.

6 terms

🧊

Tensor

A multidimensional numerical structure that generalizes scalars, vectors, and matrices.

📊

Test Statistic

A computed measure that summarizes how unusual the data is in the context of a hypothesis test.

🪟

Train / Validation / Test Split

A core data splitting approach used to separate model learning, tuning, and final evaluation in an honest way.

🔺

Triplet Loss

A representation learning loss that pulls similar examples closer together and pushes dissimilar ones apart.

⚠️

Type I and Type II Error

The two fundamental error types in hypothesis testing: false alarm and failing to detect a real effect.

📘

t-Distribution

A continuous distribution used especially to model uncertainty around the mean in small samples.

1 terms

📐

Uniform Distribution

A distribution in which all values within a given interval are equally likely.

2 terms

📏

Variance and Standard Deviation

Core measures of variability that quantify how spread out data points are around the mean.

➡️

Vector

A quantity with direction and magnitude, and one of the most fundamental representations in linear algebra.

Mathematics, Statistics and Optimization

Most Read

All Terms (109)

AIC

ANOVA

Adam Optimization

Alternative Hypothesis

BFGS

BIC

Basis

Bayes' Theorem

Bernoulli Distribution

Beta Distribution

Bias

Binomial Distribution

Bootstrap

Brier Score

Calibration

Channel Capacity

Chi-Square Distribution

Condition Number

Conditional Probability

Consistency

Constrained Optimization

Convex Optimization

Correlation

Covariance

Cross-Entropy Loss

Derivative

Determinant

Directional Derivative

Dirichlet Distribution

Dot Product

Earth Mover’s Distance

Effect Size

Eigenvalue and Eigenvector

Entropy

Estimator Variance

Expected Value

Exponential Distribution

Fisher Information

Focal Loss

Gamma Distribution

Gradient

Gradient Clipping

Gradient Descent

Hessian Matrix

Hinge Loss

Huber Loss

Independence

Information Gain

Inverse Matrix

Jacobian

Jensen-Shannon Divergence

K-Fold Cross Validation

KL Divergence

Kurtosis

Label Smoothing

Lagrange Multipliers

Law of Total Probability

Likelihood

Line Search

Log Loss

Log-Normal Distribution

Mann-Whitney U Test

Markov Property

Matrix

Maximum A Posteriori Estimation (MAP)

Maximum Likelihood Estimation (MLE)

McNemar Test

Mean Absolute Error (MAE)

Mean Squared Error (MSE)

Mean, Median, and Mode

Mini-Batch Gradient Descent

Minimum Description Length (MDL)

Momentum

Multiple Comparison Correction

Mutual Information

Newton's Method

Normal Distribution