Natural Language Processing
93 terms in the Natural Language Processing domain — each bilingual TR/EN with related-term graph.
Most Read
All Terms (93)
Abstractive Summarization
A generative summarization approach that rewrites source text to produce more natural and dense summaries.
Alignment in Translation
A concept that models which parts of the source text correspond to which parts in the target language.
Answer Verification
A safety layer that aims to verify a generated answer through source evidence, logic checks, or additional model scrutiny.
Aspect-Based Sentiment Analysis
An approach that predicts sentiment for specific aspects or entity dimensions rather than overall sentiment.
BIO Tagging
A classical sequence-labeling scheme that marks entity boundaries with beginning, inside, and outside tags.
Back Translation
A data augmentation strategy based on translating target-language data back into the source language to create synthetic parallel data.
Byte Pair Encoding
A tokenization method that builds a data-driven subword vocabulary by learning frequent subunit merges.
Byte-Level Tokenization
An approach that tokenizes text at the byte level rather than character level to build more robust representations for multilingual and noisy input.
Causal Language Modeling
An autoregressive learning objective based on predicting the next token using only previous context.
Chain-of-Thought Prompting
A prompting approach that encourages the model to generate intermediate reasoning steps before the final answer.
Closed-Book Question Answering
An approach that answers questions using only the information stored in model parameters, without external document access.
Contextual Embeddings
A modern embedding approach in which the same word receives different vectors in different contexts.
Continued Pretraining
The process of further training a pretrained language model on new data to improve general or domain-specific knowledge.
Contrastive Embedding Learning
An approach that learns semantic representations by bringing similar texts closer and pushing dissimilar texts apart in vector space.
Coreference Resolution
A task that determines whether different expressions in text refer to the same entity or event.
Cross-Encoder Reranking
A second-stage retrieval model that jointly encodes query and candidate document to estimate relevance more precisely.
De-identification
The process of masking personal or sensitive information in text to enable safer processing.
Dense Passage Retrieval
A retrieval method that maps queries and passages into dense vectors to find relevant passages within documents.
Dense Retrieval
A retrieval approach that performs semantic matching by representing queries and documents in a dense vector space.
Document-Level Machine Translation
An approach that improves consistency by translating sentences within broader document context rather than independently.
Domain-Specific Embeddings
Representation structures adapted to the terminology of specific domains such as law, healthcare, or finance rather than general language.
Emotion Cause Analysis
A task focused not only on identifying the emotion in text but also on finding the part that triggered it.
Emotion Classification
A task that classifies text into finer-grained emotional categories such as joy, anger, fear, or sadness.
Entity Linking
The task of matching an entity mention in text to the correct identity or concept in a knowledge base.
Event Coreference
The task of determining whether event mentions across sentences or documents refer to the same underlying event.
Event Extraction
The task of extracting events, triggers, and participating entities from text in a structured way.
Extractive Question Answering
A question answering approach that selects the answer as a span from a provided passage.
Extractive Summarization
A content-preserving summarization approach that creates summaries by selecting important sentences from the source text.
Factual Consistency Evaluation
An evaluation dimension that measures how consistent a generated summary or answer is with the facts in the source content.
FastText Embeddings
An embedding method that represents words through sub-character pieces and behaves more robustly on rare and derived forms.
Few-Shot Prompting
A prompting technique that adapts model behavior by guiding it with example input-output pairs.
Hard Negative Mining
A training strategy that improves retrieval and matching quality by providing semantically confusing hard negatives rather than easy negatives.
Hierarchical Text Classification
A classification problem in which labels are organized in a parent-child hierarchy rather than a flat list.
Hybrid Retrieval
An approach that combines sparse and dense retrieval signals to provide more balanced search quality.
In-Context Learning
The ability of a model to adapt task behavior from examples in context without updating its parameters.
Instruction Following
The ability of a model to reliably follow task instructions expressed in natural language.
Instruction Tuning
A fine-tuning approach that adapts a language model to respond better to natural language task instructions.
Intent Classification
A task focused on predicting what purpose or action intent a user utterance represents.
Language Modeling
A foundational NLP problem focused on learning the probability structure of language sequences in order to predict next or missing units.
Late-Interaction Embeddings
A retrieval approach that matches queries and documents through token-level interaction instead of compressing each into a single vector.
Lemmatization
The process of reducing a word to its dictionary base form while considering grammatical information.
Multi-Hop Question Answering
A question answering task that requires combining multiple pieces of information to arrive at an answer.
Multi-Label Text Classification
A classification problem in which a text can belong to multiple categories at the same time.
Multilingual Sentence Embeddings
An approach that represents sentences from different languages in a shared semantic space, enabling cross-lingual matching.
Named Entity Recognition
The task of recognizing entity spans such as people, organizations, locations, and dates within text.
Natural Language Inference
A task that determines whether one statement entails, contradicts, or is neutral with respect to another.
Nested NER
A more complex NER problem in which one entity span can contain another entity span.
Neural Machine Translation
A modern translation approach focused on generating target-language sequences while preserving meaning and fluency.
Paraphrase Detection
The task of determining whether two expressions carry the same or very similar meaning despite surface differences.
Paraphrase Mining
The process of automatically discovering sentence pairs with the same or very similar meaning within large text collections.
Preference Optimization
An alignment approach that makes model output more useful by optimizing against human or system preference signals.
Pretraining Corpus
The large text data pool used by a language model to acquire general linguistic and world knowledge.
Prompt-Based Classification
An approach that solves classification problems directly through natural language instructions and label descriptions.
Reading Comprehension
A family of tasks that measures the ability to read a text and answer meaningful questions based on its content.
Relation Extraction
The task of identifying meaningful relation types between entities mentioned in text.
Reranking
A second-stage quality-improvement method that reranks candidates from the first retrieval stage using a stronger model.
Retrieval-Augmented Generation
An architectural approach that supports model generation with external knowledge sources to produce more current and grounded answers.
Semantic Caching
A system approach that reduces latency and cost by reusing prior answers for semantically identical or similar queries.
Semantic Textual Similarity
A task that measures how semantically close two texts are regardless of surface-level overlap.
Sentence Boundary Detection
The task of reliably identifying sentence starts and boundaries in text.
Sentence Embeddings
An embedding approach focused on producing semantic representations at the sentence or short-text level.
SentencePiece
A tokenization framework that can learn subword vocabularies from raw text without relying on whitespace segmentation.
Sentiment Analysis
An NLP task focused on determining the positive, negative, or neutral emotional orientation of a text.
Slot Filling
An information extraction approach focused on automatically filling predefined information fields from text.
Sparse Neural Embeddings
A representation approach that uses neural models to produce semantic signals while preserving sparse-retrieval-style interpretability.
Sparse Retrieval
A classical yet still powerful retrieval approach based on term- or word-level matching.
Spelling Correction
A preprocessing technique that converts misspelled text into more accurate forms to improve downstream NLP quality.
Stance Detection
A task focused on identifying a text’s stance toward a given claim, topic, or target.
Stemming
An approach that reduces a word to a shorter root-like form by crudely stripping suffixes.
Stopword Filtering
A classical preprocessing technique based on removing frequent words that are assumed to have low semantic contribution.
Structured Output Prompting
A technique that asks the model to produce schema-aligned outputs such as JSON or tables instead of free text.
Subword Tokenization
An approach that splits rare words into smaller meaningful pieces to balance vocabulary size and coverage.
Summary Faithfulness
A quality dimension describing how faithfully a generated summary remains grounded in the source text.
Supervised Fine-Tuning
The process of steering a pretrained model toward more specific behavior using labeled task data.
Template-Based Extraction
A controlled extraction approach focused on obtaining structured information from predefined document or expression patterns.
Terminology-Constrained Translation
A controlled machine translation approach that preserves required translation equivalents for specific terms.
Text Classification
The task of assigning a text to one or more predefined categories, intents, or labels.
Text Deduplication
A process that removes identical or near-duplicate text samples from a dataset to improve training and evaluation quality.
Text Normalization
The process of standardizing raw text at the spelling, formatting, and character levels to make it more consistent and processable.
Token Alignment
The problem of preserving the mapping between subword tokens and original word or span structures.
Tokenization
The core language processing step that splits text into units that a model can process.
Toxicity Detection
A safety-focused NLP task aimed at identifying insults, aggression, hate speech, or other harmful language use.