Mathematics, Statistics and Optimization
111 terms in the Mathematics, Statistics and Optimization domain — each bilingual TR/EN with related-term graph.
Most Read
All Terms (111)
AIC
An information criterion that supports model selection by balancing model fit against model complexity.
ANOVA
A method used to test whether there are meaningful differences among the means of multiple groups.
Adam Optimization
A popular optimization algorithm that combines adaptive learning rates with momentum-like behavior.
Alternative Hypothesis
The hypothesis that argues there is a meaningful effect, difference, or relationship in the data, in contrast to the null hypothesis.
BFGS
A quasi-Newton optimization method that approximates second-order information to achieve efficient convergence.
BIC
A model selection criterion that applies a stronger penalty for complexity while evaluating model fit.
Backpropagation
The core algorithm in neural networks that computes gradients by propagating error backward through layers.
Basis
A set of linearly independent vectors capable of generating every element of a vector space.
Bayes' Theorem
A fundamental probability theorem that allows updating the probability of a hypothesis as new observations arrive.
Bernoulli Distribution
A discrete distribution that models single-step random experiments with only two possible outcomes.
Beta Distribution
A continuous distribution especially well suited for modeling proportions and probabilities between 0 and 1.
Bias
A statistical concept describing the tendency of an estimator to systematically deviate from the true value.
Binomial Distribution
A discrete distribution that models the number of successes in a fixed number of independent Bernoulli trials.
Bootstrap
A method that repeatedly resamples the data to estimate uncertainty, confidence intervals, and performance distributions.
Brier Score
A score that jointly evaluates the accuracy and calibration quality of probabilistic predictions.
Calibration
A property describing how well a model’s predicted probabilities align with actual observed frequencies.
Chain Rule
A rule that computes the derivative of a composite function through the derivatives of its inner and outer functions.
Channel Capacity
The theoretical maximum amount of information that a communication channel can transmit without error.
Chi-Square Distribution
A distribution derived from the sum of squared standard normal variables and widely used in statistical testing.
Condition Number
An important linear algebra indicator that measures how numerically sensitive or unstable a matrix is.
Conditional Probability
A concept that measures how likely an event is given that another event has already occurred.
Consistency
The property of an estimator converging to the true value as the sample size grows.
Constrained Optimization
An optimization approach in which the solution must satisfy not only the objective function but also specified constraints.
Convex Optimization
A class of optimization problems where the objective and constraints satisfy favorable geometric conditions that enable more reliable solutions.
Correlation
A standardized measure of the direction and strength of the linear relationship between two variables.
Covariance
A fundamental statistical measure that shows how two variables change together.
Cross-Entropy Loss
A core classification loss that measures the mismatch between the true distribution and the model’s predicted probability distribution.
Derivative
A fundamental mathematical concept that measures the rate of change and slope of a function at a given point.
Determinant
A core linear algebra quantity that summarizes a matrix’s volume scaling effect and whether it is invertible.
Directional Derivative
A derivative concept that measures how fast a function changes in a specific direction.
Dirichlet Distribution
A multidimensional probability distribution used to model the probabilities of multiple categories jointly.
Dot Product
A core linear algebra operation that measures alignment and magnitude interaction between two vectors.
Earth Mover’s Distance
A distance concept that measures how much mass must be moved to transform one distribution into another.
Effect Size
A measure that captures not just whether an effect is significant, but how large it actually is.
Eigenvalue and Eigenvector
Vectors that preserve their direction under a linear transformation, along with the associated scaling factors.
Entropy
A fundamental information-theoretic concept that measures uncertainty, disorder, or information content in a probability distribution.
Estimator Variance
A measure of how much a method’s results vary across different samples.
Expected Value
A measure indicating the long-run average value a random variable is expected to approach.
Exponential Distribution
A continuous distribution used to model the waiting time until an event occurs.
Gamma Distribution
A flexible distribution used to model positive continuous quantities and waiting times for multiple events.
Gradient
A vector containing the partial derivatives of a multivariable function, indicating direction and magnitude of change.
Gradient Clipping
A technique that limits gradient magnitude to prevent excessively large gradients from destabilizing training.
Gradient Descent
A fundamental optimization method that updates parameters in the opposite direction of the gradient to minimize a loss function.
Hessian Matrix
A matrix of second-order derivatives that helps describe the curvature of a function.
Hinge Loss
A loss function used especially in support vector machines that aims to create a safe margin between classes.
Huber Loss
A hybrid loss function that balances MSE and MAE behavior and is more robust to outliers.
Independence
A concept describing the case where the occurrence of one event or variable does not affect the probability of another.
Information Gain
An information-theoretic concept that measures how much uncertainty a feature reduces, especially in decision trees.
Inverse Matrix
A matrix that reverses a linear transformation and is defined only for full-rank matrices.
K-Fold Cross Validation
A method that repeatedly evaluates a model across different data folds to provide a more reliable estimate of performance.
KL Divergence
A directional divergence measure that quantifies how one probability distribution differs from another.
Kurtosis
A measure that summarizes tail heaviness and the tendency of a distribution to produce outliers.
Label Smoothing
A loss-related improvement technique that softens target labels to reduce overconfidence in the model.
Lagrange Multipliers
A method used in constrained optimization to balance the objective function with the imposed constraints.
Law of Total Probability
A fundamental rule for computing the total probability of an event by combining contributions from distinct cases.
Likelihood
A statistical concept expressing how probable the observed data is under a given model parameter setting.
Line Search
An optimization step-size selection approach that determines how far to move along a chosen direction.
Log Loss
A loss function that measures the quality of probabilistic classification predictions and strongly penalizes wrong confidence.
Log-Normal Distribution
A right-skewed continuous distribution used for positive variables whose logarithm is normally distributed.
Mann-Whitney U Test
A test used to compare two independent groups without relying on strong parametric assumptions.
Markov Property
A property stating that a system’s future depends only on its current state, not on the full past history.
Matrix
A structure of numbers arranged in rows and columns, central to data representation and transformations.
Maximum A Posteriori Estimation (MAP)
A Bayesian estimation approach that accounts for prior knowledge while selecting parameters that explain the data.
Maximum Likelihood Estimation (MLE)
A fundamental statistical estimation method based on selecting the parameters that make the observed data most likely.
McNemar Test
A test used to compare the error behavior of two classifiers on the same set of examples.
Mean Absolute Error (MAE)
A regression loss function that averages the absolute differences between predictions and true values, offering greater robustness.
Mean Squared Error (MSE)
A common regression loss function that averages the squared differences between predictions and true values.
Mean, Median, and Mode
Fundamental statistical measures that summarize the central tendency of a dataset from different perspectives.
Mini-Batch Gradient Descent
A widely used optimization approach that splits training data into small batches to balance efficiency and stability.
Minimum Description Length (MDL)
An information-theoretic principle stating that a good model is one that describes the data in the shortest sufficient way.
Momentum
A method that speeds up optimization by incorporating the direction of past gradient updates.
Multiple Comparison Correction
A correction approach used to control false positives when multiple hypotheses are tested.
Mutual Information
A concept that measures how much knowing one variable reduces uncertainty about another.
Newton's Method
An advanced optimization method that uses both slope and curvature information to aim for faster convergence.
Normal Distribution
One of the most widely used continuous distributions in statistics, known for its bell-shaped curve.
Null Hypothesis
The foundational starting point in hypothesis testing that assumes an observed effect is due to chance.
Partial Derivative
A derivative that measures the change of a multivariable function with respect to only one variable.
Permutation Test
A resampling-based test that evaluates whether an observed difference could be due to chance with weaker reliance on distributional assumptions.
Perplexity
A measure, especially common in language models, that summarizes how surprised a probability model is by the data.
Poisson Distribution
A discrete distribution used to model the number of events occurring within a fixed interval of time or space.
Population and Sample
The core statistical distinction between the full target group and the subset selected from it for analysis.
Posterior Probability
A Bayesian probability concept representing updated belief after observing new data.
Precision-Recall AUC
An evaluation metric that summarizes how well a model retrieves useful positives, especially in imbalanced settings.
Probability
A fundamental mathematical concept that quantifies the likelihood of an event occurring.
Proximal Gradient
An optimization method used for problems that combine smooth and non-smooth objective components.
p-Value
The probability of observing a result at least as extreme as the one obtained, assuming the null hypothesis is true.
ROC-AUC
A widely used comparison metric that summarizes a classifier’s ability to separate positives and negatives across thresholds.
Random Variable
A mathematical variable that maps the possible outcomes of a random experiment to numerical values.
Rank
A structural measure that expresses the number of linearly independent rows or columns in a matrix.
Saddle Point
A type of point that behaves like a minimum in some directions and a maximum in others, often complicating optimization.
Self-Information
An information-theoretic concept that measures how much information is gained when a single event occurs.
Singular Value Decomposition (SVD)
A powerful decomposition method that breaks a matrix into more fundamental components for structure analysis, compression, and dimensionality reduction.
Skewness
A measure showing whether a distribution is symmetric and in which direction its tail extends.
Statistical Power
The probability that a statistical test will detect an effect when that effect truly exists.
Stochastic Gradient Descent
An optimization approach that updates parameters using single examples or small subsets instead of the full dataset at each step.
Sufficiency
The property of a statistic containing all relevant information in the data about a parameter.
Tensor
A multidimensional numerical structure that generalizes scalars, vectors, and matrices.
Test Statistic
A computed measure that summarizes how unusual the data is in the context of a hypothesis test.
Train / Validation / Test Split
A core data splitting approach used to separate model learning, tuning, and final evaluation in an honest way.
Triplet Loss
A representation learning loss that pulls similar examples closer together and pushes dissimilar ones apart.
Type I and Type II Error
The two fundamental error types in hypothesis testing: false alarm and failing to detect a real effect.
t-Distribution
A continuous distribution used especially to model uncertainty around the mean in small samples.