Computer Vision

99 terms in the Computer Vision domain — each bilingual TR/EN with related-term graph.

Image PreprocessingImage ClassificationObject DetectionSegmentationFeature ExtractionFace AnalysisOCR and Document AIVideo AnalyticsObject TrackingVision-Language Models

All Terms (99)

3 terms

🔮

Action Anticipation

A task that attempts to predict a future action from a partially observed video stream before it fully unfolds.

🏃

Action Recognition

A task focused on recognizing action classes from human or object motion in video.

📦

Anchor Boxes

A design approach that facilitates object detection by using predefined candidate boxes of different scales and aspect ratios.

2 terms

🪡

Boundary-Aware Segmentation

A segmentation approach that aims to improve mask quality by modeling object boundaries more precisely.

⬛

Bounding Box Regression

A detection subtask that predicts an object’s location and size in an image as coordinates.

5 terms

🗺️

CNN Feature Maps

Intermediate representations learned by convolutional layers that carry visual patterns at different abstraction levels.

📉

Chart Understanding

A task that converts bar charts, line graphs, and similar visual presentations into structured data and semantic interpretation.

💡

Color Constancy

An image-processing approach that aims to perceive object colors more consistently under varying illumination conditions.

🎨

Color Space Conversion

A process that transforms an image into representations other than RGB to make certain visual signals more accessible.

🧲

Contrastive Visual Pretraining

An approach that learns strong visual features by bringing similar images close and pushing dissimilar ones apart in representation space.

5 terms

🧩

Data Association

The problem of matching object observations across frames to the same physical target.

🧠

DeepSORT

A tracking system that strengthens the SORT approach with appearance embeddings to improve identity stability.

🌀

Deformable DETR

An approach in transformer-based detection that makes attention computation more selective, improving convergence speed and small-object performance.

📏

Dice Coefficient

A segmentation-focused evaluation metric that measures overlap between predicted and ground-truth masks.

🧠

Document Parsing

A process that decomposes a document into text, structure, fields, and hierarchy to produce a machine-processable representation.

1 terms

🪒

Edge Detection

A classical feature extraction task that summarizes image structure by finding boundaries with strong intensity changes.

10 terms

📐

Face Alignment

A process that places the face into a common geometric reference frame to make downstream analysis more consistent.

🛡️

Face Anti-Spoofing

A security task that aims to distinguish attacks such as printed photos, replay screens, or masks from genuine face access.

🙂

Face Detection

A core face analysis task focused on locating face regions in an image or video.

🧬

Face Embedding Space

A discriminative representation space that positions face images according to identity similarity.

🧠

Face Recognition

A task focused on distinguishing individuals by producing identity-like discriminative representations from face images.

✅

Face Verification

A binary decision problem that evaluates whether two face images belong to the same person.

🎭

Facial Expression Recognition

A task focused on predicting emotional or behavioral expression types from facial muscle movement patterns.

📍

Facial Landmark Detection

A fine-grained vision task that identifies key points of face anatomy such as eyes, nose, mouth, and jaw.

🧪

Few-Shot Image Classification

A learning approach that aims to distinguish new visual categories when only a few examples per class are available.

🔬

Fine-Grained Image Classification

A high-resolution classification problem focused on distinguishing highly similar subcategories.

2 terms

👀

Gaze Estimation

A fine-grained face analysis task focused on estimating where a person is looking from eye direction.

📐

Geometric Transformation

A family of operations that rearranges image coordinates to apply scaling, rotation, translation, and perspective changes.

6 terms

📶

HOG Features

A classical feature extraction approach that represents shape and edge structure through local gradient orientation distributions.

✍️

Handwriting Recognition

An OCR subtask focused on converting handwritten content, which is far more variable than printed text, into machine-readable text.

🧭

Head Pose Estimation

A geometric analysis task that estimates the orientation of the face or head in space through angle values.

🌲

Hierarchical Image Classification

A multi-level visual classification problem in which class labels are organized in a parent-child taxonomy.

📊

Histogram Equalization

A classical enhancement technique that redistributes intensity values to improve image contrast.

🧮

Hungarian Assignment for Tracking

An optimization step that preserves identity continuity by matching detections with existing tracks at minimum cost.

10 terms

🖼️

Image Augmentation

A data-driven technique that improves model generalization by diversifying training data through transformations.

📝

Image Captioning

The task of expressing the content of an image in fluent and meaningful natural language.

🔧

Image Deblurring

A restoration task that aims to recover visual information by reducing blur caused by motion, focus error, or camera shake.

🧼

Image Denoising

The process of improving image quality by reducing noise caused by sensors, compression, or transmission.

⚖️

Image Normalization

A preprocessing step that brings pixel values into a defined range or distribution to make model training more stable.

📐

Image Registration

The process of aligning images from different times, viewpoints, or sensors within a common coordinate system.

🔍

Image Super-Resolution

A restoration approach focused on generating more detailed high-resolution images from low-resolution inputs.

🧲

Image-Text Contrastive Learning

An approach that learns multimodal representations by bringing related image-text pairs together and pushing unrelated pairs apart in a shared space.

🔎

Image-Text Retrieval

A task that retrieves relevant images from text or relevant text from images through a shared multimodal representation space.

🎭

Instance Segmentation

A task that distinguishes individual objects of the same class and produces a pixel mask for each one.

4 terms

📈

Kalman Filter Tracking

A classical approach that stabilizes tracking by predicting an object’s future position through a motion model.

🔑

Key-Value Extraction

A task that matches field names with their corresponding values in a document to create structured data.

📌

Keypoint Detection

An operation that finds distinctive and repeatable local points in an image to support matching and alignment tasks.

📦

Knowledge Distillation in Vision

An approach for transferring knowledge from a large, powerful vision model into a smaller and more efficient one.

3 terms

🗂️

Layout Analysis

A Document AI task that structurally separates titles, paragraphs, tables, images, and layout blocks in a document.

🔗

Local Feature Matching

A task that matches similar local points across images to enable alignment, registration, and 3D inference.

📉

Long-Tailed Recognition

A problem focused on strong recognition under imbalanced data distributions where some classes are abundant and others are scarce.

9 terms

🪄

Mask Refinement

A refinement process that improves an initial segmentation output in terms of boundary quality and detail precision.

📈

Mean Average Precision

A core evaluation metric that summarizes object detection performance across classes and threshold settings.

📏

Metric Learning for Vision

An approach that builds comparison-based visual systems by bringing similar examples closer and separating different ones in embedding space.

🎥

Multi-Camera Video Analytics

An approach that jointly analyzes multiple camera streams to provide broader scene and behavior understanding.

🗂️

Multi-Label Image Classification

A more realistic classification problem in which an image can belong to multiple classes at the same time.

👥

Multi-Object Tracking

A task that maintains both localization and identity continuity for multiple objects over time in video.

📍

Multimodal Grounding

The process of aligning linguistic expressions with the correct region, object, or visual structure in an image.

📋

Multimodal Instruction Tuning

A fine-tuning process that develops multimodal models capable of interpreting image and text inputs together with natural language instructions.

📚

Multimodal RAG for Vision

An architectural approach that combines visual inputs with external knowledge sources to produce more grounded multimodal answers.

1 terms

🧯

Non-Maximum Suppression

An operation that filters overlapping boxes produced for the same object in order to create cleaner detection output.

7 terms

🫥

Occlusion Handling

A set of strategies aimed at maintaining detection and tracking when the object becomes partially or fully invisible.

⚡

One-Stage Detector

A family of models that performs fast detection by combining proposal generation and classification in a single forward pass.

🚪

Open-Set Recognition

An approach that enables a model to flag unseen classes as unknown instead of assigning them an overconfident incorrect label.

🌍

Open-Vocabulary Detection

A detection approach that can perceive a broader object world using natural language labels instead of fixed class lists.

🌍

Open-World Object Detection

An approach that aims for a model not only to detect known objects but also to handle unknown ones as a separate category.

📄

Optical Character Recognition

A core Document AI task that converts text within images or documents into machine-processable text.

📦

Oriented Object Detection

A task focused on detecting rotated or tilted objects with angled boxes instead of axis-aligned boxes.

2 terms

🌐

Panoptic Segmentation

A unified segmentation task that models both scene classes and separate object instances under a single framework.

🪄

Promptable Segmentation

A flexible visual separation approach that performs segmentation guided by prompts such as points, boxes, or text.

3 terms

🪪

Re-Identification

A task that enables the re-matching of an object or person across cameras or separated time intervals.

📚

Reading Order Detection

A task that determines the order in which document content should be read to reconstruct the correct text flow.

📍

Referring Expression Comprehension

A task that matches a natural language description to the correct region in an image.

9 terms

🧠

SIFT Descriptor

A classical but effective descriptor method that produces local visual features robust to scale and rotation.

⚙️

SORT Tracker

A lightweight system that performs fast multi-object tracking by combining Kalman filtering with Hungarian assignment.

🧬

Self-Supervised Visual Features

Visual representations learned without labels that can be reused across many downstream vision tasks.

🧩

Semantic Segmentation

A task that assigns a class label to every pixel in an image for pixel-level scene understanding.

🧪

Semi-Supervised Segmentation

An approach that improves segmentation quality by using a small set of labeled examples together with many unlabeled images.

🎬

Shot Boundary Detection

A task that identifies scene or camera-shot transitions in video to enable structural video analysis.

🎯

Single Object Tracking

A task that continuously updates the location of a selected object over time in video.

🏷️

Single-Label Image Classification

A fundamental vision task that assigns an image to exactly one of a set of predefined classes.

🎞️

Spatio-Temporal Convolution

A convolutional approach that jointly models spatial patterns and temporal change within video.

7 terms

📊

Table Structure Recognition

A task that extracts row, column, and cell relationships in document tables so the data becomes machine-usable.

⏱️

Temporal Action Localization

A task that identifies not only the action type in a video but also the time interval in which it occurs.

✂️

Temporal Action Segmentation

A task that segments long video streams into meaningful action units and assigns an action label to each temporal interval.

🔍

Text Detection in Documents

A document vision task that locates text regions before character recognition is performed.

🔗

Tracking-by-Detection

A common approach that performs object detection on each frame and builds tracks by associating detections over time.

🔁

Transfer Learning in Vision

An approach based on reusing visual models pretrained on large datasets for new tasks.

🎯

Two-Stage Detector

An architectural approach that first proposes candidate regions and then classifies them for more precise object detection.

1 terms

🧬

U-Net

An encoder-decoder segmentation architecture especially effective in biomedical and pixel-level tasks.

7 terms

🚨

Video Anomaly Detection

A task focused on detecting deviations from normal behavior patterns in video streams.

📝

Video Summarization

A task that compresses long video streams into shorter, high-representation summaries with minimal information loss.

🧠

Video Transformer

A modern architectural approach that tokenizes video across time and space and models it with attention mechanisms.

🧠

Vision Transformer Features

A modern visual feature structure that splits images into patch tokens and learns representations through global attention.

🧠

Vision-Language Model

A multimodal model family that combines visual and textual information within a shared representation or generation framework.

🧠

Visual Document Understanding

An approach that jointly interprets text, layout, and visual elements in a document to build higher-level semantic understanding.

❓

Visual Question Answering

A multimodal task that answers natural language questions about an image based on visual context.

1 terms

🪶

Weakly Supervised Segmentation

An approach that aims to learn segmentation from cheaper labels instead of full pixel masks.

1 terms

🪄

Zero-Shot Image Classification

An approach in which a model recognizes new classes without additional training through textual descriptions or shared representation space.

Computer Vision

Most Read

All Terms (99)

Action Anticipation

Action Recognition

Anchor Boxes

Boundary-Aware Segmentation

Bounding Box Regression

CNN Feature Maps

Chart Understanding

Color Constancy

Color Space Conversion

Contrastive Visual Pretraining

Data Association

DeepSORT

Deformable DETR

Dice Coefficient

Document Parsing

Edge Detection

Face Alignment

Face Anti-Spoofing

Face Detection

Face Embedding Space

Face Recognition

Face Verification

Facial Expression Recognition

Facial Landmark Detection

Few-Shot Image Classification

Fine-Grained Image Classification

Gaze Estimation

Geometric Transformation

HOG Features

Handwriting Recognition

Head Pose Estimation

Hierarchical Image Classification

Histogram Equalization

Hungarian Assignment for Tracking

Image Augmentation

Image Captioning

Image Deblurring

Image Denoising

Image Normalization

Image Registration

Image Super-Resolution

Image-Text Contrastive Learning

Image-Text Retrieval

Instance Segmentation

Kalman Filter Tracking

Key-Value Extraction

Keypoint Detection

Knowledge Distillation in Vision

Layout Analysis

Local Feature Matching

Long-Tailed Recognition

Mask Refinement

Mean Average Precision

Metric Learning for Vision

Multi-Camera Video Analytics

Multi-Label Image Classification

Multi-Object Tracking

Multimodal Grounding

Multimodal Instruction Tuning

Multimodal RAG for Vision

Non-Maximum Suppression

Occlusion Handling

One-Stage Detector

Open-Set Recognition

Open-Vocabulary Detection

Open-World Object Detection

Optical Character Recognition

Oriented Object Detection

Panoptic Segmentation

Promptable Segmentation

Re-Identification

Reading Order Detection

Referring Expression Comprehension

SIFT Descriptor

SORT Tracker

Self-Supervised Visual Features

Semantic Segmentation