Natural Language Processing

93 terms in the Natural Language Processing domain — each bilingual TR/EN with related-term graph.

Text PreprocessingTokenizationEmbeddingsLanguage ModelingText ClassificationSentiment AnalysisNamed Entity RecognitionInformation ExtractionText SummarizationQuestion AnsweringMachine TranslationSemantic Similarity

All Terms (93)

4 terms

📝

Abstractive Summarization

A generative summarization approach that rewrites source text to produce more natural and dense summaries.

🔀

Alignment in Translation

A concept that models which parts of the source text correspond to which parts in the target language.

✅

Answer Verification

A safety layer that aims to verify a generated answer through source evidence, logic checks, or additional model scrutiny.

🔍

Aspect-Based Sentiment Analysis

An approach that predicts sentiment for specific aspects or entity dimensions rather than overall sentiment.

4 terms

🔖

BIO Tagging

A classical sequence-labeling scheme that marks entity boundaries with beginning, inside, and outside tags.

🔁

Back Translation

A data augmentation strategy based on translating target-language data back into the source language to create synthetic parallel data.

🔡

Byte Pair Encoding

A tokenization method that builds a data-driven subword vocabulary by learning frequent subunit merges.

💾

Byte-Level Tokenization

An approach that tokenizes text at the byte level rather than character level to build more robust representations for multilingual and noisy input.

8 terms

⏩

Causal Language Modeling

An autoregressive learning objective based on predicting the next token using only previous context.

🪜

Chain-of-Thought Prompting

A prompting approach that encourages the model to generate intermediate reasoning steps before the final answer.

🧠

Closed-Book Question Answering

An approach that answers questions using only the information stored in model parameters, without external document access.

🌐

Contextual Embeddings

A modern embedding approach in which the same word receives different vectors in different contexts.

🔄

Continued Pretraining

The process of further training a pretrained language model on new data to improve general or domain-specific knowledge.

🧲

Contrastive Embedding Learning

An approach that learns semantic representations by bringing similar texts closer and pushing dissimilar texts apart in vector space.

🔁

Coreference Resolution

A task that determines whether different expressions in text refer to the same entity or event.

🏁

Cross-Encoder Reranking

A second-stage retrieval model that jointly encodes query and candidate document to estimate relevance more precisely.

5 terms

🛡️

De-identification

The process of masking personal or sensitive information in text to enable safer processing.

📚

Dense Passage Retrieval

A retrieval method that maps queries and passages into dense vectors to find relevant passages within documents.

🧲

Dense Retrieval

A retrieval approach that performs semantic matching by representing queries and documents in a dense vector space.

📄

Document-Level Machine Translation

An approach that improves consistency by translating sentences within broader document context rather than independently.

🏥

Domain-Specific Embeddings

Representation structures adapted to the terminology of specific domains such as law, healthcare, or finance rather than general language.

7 terms

🔬

Emotion Cause Analysis

A task focused not only on identifying the emotion in text but also on finding the part that triggered it.

🎭

Emotion Classification

A task that classifies text into finer-grained emotional categories such as joy, anger, fear, or sadness.

🔗

Entity Linking

The task of matching an entity mention in text to the correct identity or concept in a knowledge base.

🧷

Event Coreference

The task of determining whether event mentions across sentences or documents refer to the same underlying event.

📌

Event Extraction

The task of extracting events, triggers, and participating entities from text in a structured way.

📍

Extractive Question Answering

A question answering approach that selects the answer as a span from a provided passage.

✂️

Extractive Summarization

A content-preserving summarization approach that creates summaries by selecting important sentences from the source text.

3 terms

🧭

Factual Consistency Evaluation

An evaluation dimension that measures how consistent a generated summary or answer is with the facts in the source content.

🧬

FastText Embeddings

An embedding method that represents words through sub-character pieces and behaves more robustly on rare and derived forms.

🧪

Few-Shot Prompting

A prompting technique that adapts model behavior by guiding it with example input-output pairs.

2 terms

💬

Generative Question Answering

A QA approach that generates answers as free text, offering more natural but potentially riskier responses.

🌍

GloVe

An embedding method that produces dense word vectors using global word co-occurrence statistics.

3 terms

⛏️

Hard Negative Mining

A training strategy that improves retrieval and matching quality by providing semantically confusing hard negatives rather than easy negatives.

🌲

Hierarchical Text Classification

A classification problem in which labels are organized in a parent-child hierarchy rather than a flat list.

⚖️

Hybrid Retrieval

An approach that combines sparse and dense retrieval signals to provide more balanced search quality.

4 terms

🧠

In-Context Learning

The ability of a model to adapt task behavior from examples in context without updating its parameters.

📋

Instruction Following

The ability of a model to reliably follow task instructions expressed in natural language.

📋

Instruction Tuning

A fine-tuning approach that adapts a language model to respond better to natural language task instructions.

🎯

Intent Classification

A task focused on predicting what purpose or action intent a user utterance represents.

1 terms

🔑

Keyphrase Extraction

The task of automatically identifying the key terms and phrases that best represent a text.

3 terms

📚

Language Modeling

A foundational NLP problem focused on learning the probability structure of language sequences in order to predict next or missing units.

🧩

Late-Interaction Embeddings

A retrieval approach that matches queries and documents through token-level interaction instead of compressing each into a single vector.

📘

Lemmatization

The process of reducing a word to its dictionary base form while considering grammatical information.

3 terms

🪜

Multi-Hop Question Answering

A question answering task that requires combining multiple pieces of information to arrive at an answer.

🗂️

Multi-Label Text Classification

A classification problem in which a text can belong to multiple categories at the same time.

🌐

Multilingual Sentence Embeddings

An approach that represents sentences from different languages in a shared semantic space, enabling cross-lingual matching.

4 terms

🏷️

Named Entity Recognition

The task of recognizing entity spans such as people, organizations, locations, and dates within text.

🧠

Natural Language Inference

A task that determines whether one statement entails, contradicts, or is neutral with respect to another.

🪆

Nested NER

A more complex NER problem in which one entity span can contain another entity span.

🌍

Neural Machine Translation

A modern translation approach focused on generating target-language sequences while preserving meaning and fluency.

1 terms

🕸️

Open Information Extraction

An approach that extracts subject-relation-object structures from text without relying on a predefined relation schema.

5 terms

🪞

Paraphrase Detection

The task of determining whether two expressions carry the same or very similar meaning despite surface differences.

⛏️

Paraphrase Mining

The process of automatically discovering sentence pairs with the same or very similar meaning within large text collections.

⚖️

Preference Optimization

An alignment approach that makes model output more useful by optimizing against human or system preference signals.

🗃️

Pretraining Corpus

The large text data pool used by a language model to acquire general linguistic and world knowledge.

🪄

Prompt-Based Classification

An approach that solves classification problems directly through natural language instructions and label descriptions.

2 terms

🔎

Query Expansion

An approach that broadens retrieval coverage by enriching the user query with synonyms, related terms, or rewrites.

🎯

Query-Focused Summarization

A summarization approach that focuses on a specific user query or information need rather than producing a general summary.

4 terms

📖

Reading Comprehension

A family of tasks that measures the ability to read a text and answer meaningful questions based on its content.

🕸️

Relation Extraction

The task of identifying meaningful relation types between entities mentioned in text.

🏁

Reranking

A second-stage quality-improvement method that reranks candidates from the first retrieval stage using a stronger model.

📚

Retrieval-Augmented Generation

An architectural approach that supports model generation with external knowledge sources to produce more current and grounded answers.

17 terms

⚡

Semantic Caching

A system approach that reduces latency and cost by reusing prior answers for semantically identical or similar queries.

🤝

Semantic Textual Similarity

A task that measures how semantically close two texts are regardless of surface-level overlap.

📍

Sentence Boundary Detection

The task of reliably identifying sentence starts and boundaries in text.

🧾

Sentence Embeddings

An embedding approach focused on producing semantic representations at the sentence or short-text level.

🧱

SentencePiece

A tokenization framework that can learn subword vocabularies from raw text without relying on whitespace segmentation.

😊

Sentiment Analysis

An NLP task focused on determining the positive, negative, or neutral emotional orientation of a text.

🧾

Slot Filling

An information extraction approach focused on automatically filling predefined information fields from text.

🗂️

Sparse Neural Embeddings

A representation approach that uses neural models to produce semantic signals while preserving sparse-retrieval-style interpretability.

🗂️

Sparse Retrieval

A classical yet still powerful retrieval approach based on term- or word-level matching.

✍️

Spelling Correction

A preprocessing technique that converts misspelled text into more accurate forms to improve downstream NLP quality.

🧭

Stance Detection

A task focused on identifying a text’s stance toward a given claim, topic, or target.

🌱

Stemming

An approach that reduces a word to a shorter root-like form by crudely stripping suffixes.

🚫

Stopword Filtering

A classical preprocessing technique based on removing frequent words that are assumed to have low semantic contribution.

🧾

Structured Output Prompting

A technique that asks the model to produce schema-aligned outputs such as JSON or tables instead of free text.

🧩

Subword Tokenization

An approach that splits rare words into smaller meaningful pieces to balance vocabulary size and coverage.

🧭

Summary Faithfulness

A quality dimension describing how faithfully a generated summary remains grounded in the source text.

🎯

Supervised Fine-Tuning

The process of steering a pretrained model toward more specific behavior using labeled task data.

8 terms

📄

Template-Based Extraction

A controlled extraction approach focused on obtaining structured information from predefined document or expression patterns.

📚

Terminology-Constrained Translation

A controlled machine translation approach that preserves required translation equivalents for specific terms.

🏷️

Text Classification

The task of assigning a text to one or more predefined categories, intents, or labels.

🧽

Text Deduplication

A process that removes identical or near-duplicate text samples from a dataset to improve training and evaluation quality.

🧹

Text Normalization

The process of standardizing raw text at the spelling, formatting, and character levels to make it more consistent and processable.

🔗

Token Alignment

The problem of preserving the mapping between subword tokens and original word or span structures.

✂️

Tokenization

The core language processing step that splits text into units that a model can process.

🚨

Toxicity Detection

A safety-focused NLP task aimed at identifying insults, aggression, hate speech, or other harmful language use.

2 terms

🔤

Unicode Normalization

The process of converting visually identical but differently encoded characters into a standard form.

🎲

Unigram Language Model Tokenization

A method that learns a subunit vocabulary probabilistically to make token segmentation more data-aligned.

2 terms

📐

Word2Vec

A historical embedding approach that represents word meaning through dense vectors learned from contextual co-occurrence.

🧠

WordPiece

A widely used tokenization method that optimizes subword units with respect to probabilistic coverage.

1 terms

🪄

Zero-Shot Text Classification

An approach in which a model classifies texts for new labels without additional training by using natural language label descriptions.

Natural Language Processing

Most Read

All Terms (93)

Abstractive Summarization

Alignment in Translation

Answer Verification

Aspect-Based Sentiment Analysis

BIO Tagging

Back Translation

Byte Pair Encoding

Byte-Level Tokenization

Causal Language Modeling

Chain-of-Thought Prompting

Closed-Book Question Answering

Contextual Embeddings

Continued Pretraining

Contrastive Embedding Learning

Coreference Resolution

Cross-Encoder Reranking

De-identification

Dense Passage Retrieval

Dense Retrieval

Document-Level Machine Translation

Domain-Specific Embeddings

Emotion Cause Analysis

Emotion Classification

Entity Linking

Event Coreference

Event Extraction

Extractive Question Answering

Extractive Summarization

Factual Consistency Evaluation

FastText Embeddings

Few-Shot Prompting

Generative Question Answering

GloVe

Hard Negative Mining

Hierarchical Text Classification

Hybrid Retrieval

In-Context Learning

Instruction Following

Instruction Tuning

Intent Classification

Keyphrase Extraction

Language Modeling

Late-Interaction Embeddings

Lemmatization

Multi-Hop Question Answering

Multi-Label Text Classification

Multilingual Sentence Embeddings

Named Entity Recognition

Natural Language Inference

Nested NER

Neural Machine Translation

Open Information Extraction

Paraphrase Detection

Paraphrase Mining

Preference Optimization

Pretraining Corpus

Prompt-Based Classification

Query Expansion

Query-Focused Summarization

Reading Comprehension

Relation Extraction

Reranking

Retrieval-Augmented Generation

Semantic Caching

Semantic Textual Similarity

Sentence Boundary Detection

Sentence Embeddings

SentencePiece

Sentiment Analysis

Slot Filling

Sparse Neural Embeddings

Sparse Retrieval

Spelling Correction

Stance Detection

Stemming

Stopword Filtering

Structured Output Prompting