Computer Vision
99 terms in the Computer Vision domain — each bilingual TR/EN with related-term graph.
Most Read
All Terms (99)
Action Anticipation
A task that attempts to predict a future action from a partially observed video stream before it fully unfolds.
Action Recognition
A task focused on recognizing action classes from human or object motion in video.
Anchor Boxes
A design approach that facilitates object detection by using predefined candidate boxes of different scales and aspect ratios.
CNN Feature Maps
Intermediate representations learned by convolutional layers that carry visual patterns at different abstraction levels.
Chart Understanding
A task that converts bar charts, line graphs, and similar visual presentations into structured data and semantic interpretation.
Color Constancy
An image-processing approach that aims to perceive object colors more consistently under varying illumination conditions.
Color Space Conversion
A process that transforms an image into representations other than RGB to make certain visual signals more accessible.
Contrastive Visual Pretraining
An approach that learns strong visual features by bringing similar images close and pushing dissimilar ones apart in representation space.
Data Association
The problem of matching object observations across frames to the same physical target.
DeepSORT
A tracking system that strengthens the SORT approach with appearance embeddings to improve identity stability.
Deformable DETR
An approach in transformer-based detection that makes attention computation more selective, improving convergence speed and small-object performance.
Dice Coefficient
A segmentation-focused evaluation metric that measures overlap between predicted and ground-truth masks.
Document Parsing
A process that decomposes a document into text, structure, fields, and hierarchy to produce a machine-processable representation.
Face Alignment
A process that places the face into a common geometric reference frame to make downstream analysis more consistent.
Face Anti-Spoofing
A security task that aims to distinguish attacks such as printed photos, replay screens, or masks from genuine face access.
Face Detection
A core face analysis task focused on locating face regions in an image or video.
Face Embedding Space
A discriminative representation space that positions face images according to identity similarity.
Face Recognition
A task focused on distinguishing individuals by producing identity-like discriminative representations from face images.
Face Verification
A binary decision problem that evaluates whether two face images belong to the same person.
Facial Expression Recognition
A task focused on predicting emotional or behavioral expression types from facial muscle movement patterns.
Facial Landmark Detection
A fine-grained vision task that identifies key points of face anatomy such as eyes, nose, mouth, and jaw.
Few-Shot Image Classification
A learning approach that aims to distinguish new visual categories when only a few examples per class are available.
Fine-Grained Image Classification
A high-resolution classification problem focused on distinguishing highly similar subcategories.
HOG Features
A classical feature extraction approach that represents shape and edge structure through local gradient orientation distributions.
Handwriting Recognition
An OCR subtask focused on converting handwritten content, which is far more variable than printed text, into machine-readable text.
Head Pose Estimation
A geometric analysis task that estimates the orientation of the face or head in space through angle values.
Hierarchical Image Classification
A multi-level visual classification problem in which class labels are organized in a parent-child taxonomy.
Histogram Equalization
A classical enhancement technique that redistributes intensity values to improve image contrast.
Hungarian Assignment for Tracking
An optimization step that preserves identity continuity by matching detections with existing tracks at minimum cost.
Image Augmentation
A data-driven technique that improves model generalization by diversifying training data through transformations.
Image Captioning
The task of expressing the content of an image in fluent and meaningful natural language.
Image Deblurring
A restoration task that aims to recover visual information by reducing blur caused by motion, focus error, or camera shake.
Image Denoising
The process of improving image quality by reducing noise caused by sensors, compression, or transmission.
Image Normalization
A preprocessing step that brings pixel values into a defined range or distribution to make model training more stable.
Image Registration
The process of aligning images from different times, viewpoints, or sensors within a common coordinate system.
Image Super-Resolution
A restoration approach focused on generating more detailed high-resolution images from low-resolution inputs.
Image-Text Contrastive Learning
An approach that learns multimodal representations by bringing related image-text pairs together and pushing unrelated pairs apart in a shared space.
Image-Text Retrieval
A task that retrieves relevant images from text or relevant text from images through a shared multimodal representation space.
Instance Segmentation
A task that distinguishes individual objects of the same class and produces a pixel mask for each one.
Kalman Filter Tracking
A classical approach that stabilizes tracking by predicting an object’s future position through a motion model.
Key-Value Extraction
A task that matches field names with their corresponding values in a document to create structured data.
Keypoint Detection
An operation that finds distinctive and repeatable local points in an image to support matching and alignment tasks.
Knowledge Distillation in Vision
An approach for transferring knowledge from a large, powerful vision model into a smaller and more efficient one.
Layout Analysis
A Document AI task that structurally separates titles, paragraphs, tables, images, and layout blocks in a document.
Local Feature Matching
A task that matches similar local points across images to enable alignment, registration, and 3D inference.
Long-Tailed Recognition
A problem focused on strong recognition under imbalanced data distributions where some classes are abundant and others are scarce.
Mask Refinement
A refinement process that improves an initial segmentation output in terms of boundary quality and detail precision.
Mean Average Precision
A core evaluation metric that summarizes object detection performance across classes and threshold settings.
Metric Learning for Vision
An approach that builds comparison-based visual systems by bringing similar examples closer and separating different ones in embedding space.
Multi-Camera Video Analytics
An approach that jointly analyzes multiple camera streams to provide broader scene and behavior understanding.
Multi-Label Image Classification
A more realistic classification problem in which an image can belong to multiple classes at the same time.
Multi-Object Tracking
A task that maintains both localization and identity continuity for multiple objects over time in video.
Multimodal Grounding
The process of aligning linguistic expressions with the correct region, object, or visual structure in an image.
Multimodal Instruction Tuning
A fine-tuning process that develops multimodal models capable of interpreting image and text inputs together with natural language instructions.
Multimodal RAG for Vision
An architectural approach that combines visual inputs with external knowledge sources to produce more grounded multimodal answers.
Occlusion Handling
A set of strategies aimed at maintaining detection and tracking when the object becomes partially or fully invisible.
One-Stage Detector
A family of models that performs fast detection by combining proposal generation and classification in a single forward pass.
Open-Set Recognition
An approach that enables a model to flag unseen classes as unknown instead of assigning them an overconfident incorrect label.
Open-Vocabulary Detection
A detection approach that can perceive a broader object world using natural language labels instead of fixed class lists.
Open-World Object Detection
An approach that aims for a model not only to detect known objects but also to handle unknown ones as a separate category.
Optical Character Recognition
A core Document AI task that converts text within images or documents into machine-processable text.
Oriented Object Detection
A task focused on detecting rotated or tilted objects with angled boxes instead of axis-aligned boxes.
Re-Identification
A task that enables the re-matching of an object or person across cameras or separated time intervals.
Reading Order Detection
A task that determines the order in which document content should be read to reconstruct the correct text flow.
Referring Expression Comprehension
A task that matches a natural language description to the correct region in an image.
SIFT Descriptor
A classical but effective descriptor method that produces local visual features robust to scale and rotation.
SORT Tracker
A lightweight system that performs fast multi-object tracking by combining Kalman filtering with Hungarian assignment.
Self-Supervised Visual Features
Visual representations learned without labels that can be reused across many downstream vision tasks.
Semantic Segmentation
A task that assigns a class label to every pixel in an image for pixel-level scene understanding.
Semi-Supervised Segmentation
An approach that improves segmentation quality by using a small set of labeled examples together with many unlabeled images.
Shot Boundary Detection
A task that identifies scene or camera-shot transitions in video to enable structural video analysis.
Single Object Tracking
A task that continuously updates the location of a selected object over time in video.
Single-Label Image Classification
A fundamental vision task that assigns an image to exactly one of a set of predefined classes.
Spatio-Temporal Convolution
A convolutional approach that jointly models spatial patterns and temporal change within video.
Table Structure Recognition
A task that extracts row, column, and cell relationships in document tables so the data becomes machine-usable.
Temporal Action Localization
A task that identifies not only the action type in a video but also the time interval in which it occurs.
Temporal Action Segmentation
A task that segments long video streams into meaningful action units and assigns an action label to each temporal interval.
Text Detection in Documents
A document vision task that locates text regions before character recognition is performed.
Tracking-by-Detection
A common approach that performs object detection on each frame and builds tracks by associating detections over time.
Transfer Learning in Vision
An approach based on reusing visual models pretrained on large datasets for new tasks.
Two-Stage Detector
An architectural approach that first proposes candidate regions and then classifies them for more precise object detection.
Video Anomaly Detection
A task focused on detecting deviations from normal behavior patterns in video streams.
Video Summarization
A task that compresses long video streams into shorter, high-representation summaries with minimal information loss.
Video Transformer
A modern architectural approach that tokenizes video across time and space and models it with attention mechanisms.
Vision Transformer Features
A modern visual feature structure that splits images into patch tokens and learns representations through global attention.
Vision-Language Model
A multimodal model family that combines visual and textual information within a shared representation or generation framework.
Visual Document Understanding
An approach that jointly interprets text, layout, and visual elements in a document to build higher-level semantic understanding.
Visual Question Answering
A multimodal task that answers natural language questions about an image based on visual context.