Skip to content
Corporate Program
🧪

AI Evaluation & Quality Engineer

AI QA / Eval Specialist

Be the QA discipline that measures AI hallucination and catches regression.

Evaluation is a new profession; 'AI testing QA' doesn't exist in Türkiye. Companies ship to prod and hope. Comprehensive QA practice: golden dataset design, LLM-as-judge methodology, RAGAS/DeepEval/Promptfoo/Phoenix tools, regression testing for prompts and continuous evaluation pipelines.

Quick Facts

Duration
6 weeks
Level
Intermediate
Micro-Trainings
12
Total Hours
48

Why This Program for Your Company

Talent Development

Grow your in-house teams; reduce vendor and outsourcing dependency

Fast Time-to-Value

Built for a 90-day pilot-to-production trajectory

Measurable ROI

Before/after capability report + KPI dashboard with tangible outcomes

AI Culture

AI adoption across all levels — from executive to engineer

Delivery Models

Choose the delivery format that fits your team

On-site

At your company location, closed group

Hybrid

Online + periodic in-person intensives

Fully Remote

Live remote + recordings + lab notebooks

Train-the-Trainer

Build in-house trainers — long-term scaling

Tailored to Your Company

Content is customized to your industry, regulatory framework, existing tech stack and target use cases. Labs run on your existing systems or sample datasets.

Lab Environment

Hands-on labs run on your company data (under NDA), isolated sandbox or sample dataset

Post-Training Support

30 days async support (Slack/Teams/Discord) + optional monthly follow-up sessions + code review support

Why Now? — Türkiye's Empty Market

AI-specific QA discipline is nearly unknown in Türkiye. As SaaS AI products grow, this role becomes critical.

About the Program

Target Teams

  • QA engineers transitioning to AI
  • AI engineers needing eval practice
  • Test automation leads
  • Data scientists focused on evaluation

Your Team's Outcomes

  • Design and maintain golden datasets
  • Set up pairwise and rubric eval with LLM-as-judge
  • End-to-end eval pipeline with RAGAS, DeepEval, Phoenix, Promptfoo
  • Conduct bias and fairness testing
  • Integrate continuous evaluation into CI/CD

Prerequisites

  • Intermediate Python
  • Basic LLM API experience
  • QA fundamentals (advantage)

Trainings in this Program

12 modules / micro-trainings

  1. 01

    AI Evaluation Paradigm

  2. 02

    Golden Dataset Design

  3. 03

    LLM-as-Judge Methodology

  4. 04

    Pairwise & Rubric-Based Evaluation

  5. 05

    RAGAS, DeepEval, Phoenix, Promptfoo Tools

  6. 06

    Regression Testing for Prompts

  7. 07

    Human Evaluation Operations

  8. 08

    Bias & Fairness Testing

  9. 09

    Online Evaluation (A/B, Shadow)

  10. 10

    AI Product Test Pyramid

  11. 11

    Continuous Evaluation Pipeline

  12. 12

    Capstone: Build Eval Suite for an Existing AI Product

Capstone Project

Set up golden dataset, LLM-as-judge eval, regression testing, bias testing and CI/CD-integrated continuous eval pipeline for a real AI product.

How We Work

From discovery to delivery and post-training follow-up

  1. 1

    Discovery

    Free 30min — team capability map, use case discovery, goal setting

  2. 2

    Design

    Custom curriculum, lab scenarios and delivery timeline for your use cases

  3. 3

    Delivery

    Live training + hands-on labs + capstone project + completion certificate

  4. 4

    Follow-up

    Capability report + 30-day support + optional monthly check-in sessions

Career Path

Positions you can target after this program

AI QA / Eval SpecialistQA engineers transitioning to AIAI engineers needing eval practiceTest automation leads

Tech Stack & Topics

evaluationqatestingragasdeepevalpromptfoo

Frequently Asked Questions

How do enrollment and participant selection work?

In the discovery call we map your team capability and define the right participant profile (role, level, prior knowledge). Standard packages serve 5-15 participants, corporate packages 15-40; larger groups run as multi-cohort schedules.

How is pricing structured?

Pricing depends on participant count, duration, customization depth, delivery model (on-site / hybrid / remote) and post-support scope. A custom quote is provided after discovery. Multi-year partnership discounts available.

Can the curriculum be customized for our use cases?

Yes. After discovery every program is tailored to your industry, regulatory framework (KVKK, BDDK, EU AI Act etc.), data structure, tech stack and target use cases. Labs can run on your existing systems or company data under NDA.

On-site or remote?

Both. Choose in-person (at your location — Istanbul, Ankara, Izmir, Bursa, Antalya and other cities), fully online, or hybrid (online + condensed in-person).

Is post-training support included?

Standard package includes 30 days async support (Slack/Teams/Discord channel). Extended options: monthly follow-up sessions, code review support, mentorship package and quarterly business review.

Are certificates provided?

Yes — each participant receives a verifiable URL certificate, and the company gets a before/after capability report and training ROI dossier.

Who is this program for?

QA engineers transitioning to AI • AI engineers needing eval practice • Test automation leads • Data scientists focused on evaluation

What will I learn?

Design and maintain golden datasets • Set up pairwise and rubric eval with LLM-as-judge • End-to-end eval pipeline with RAGAS, DeepEval, Phoenix, Promptfoo • Conduct bias and fairness testing • Integrate continuous evaluation into CI/CD

What is the duration and format?

6 weeks · 48 hours · Self-paced + cohort

What are the prerequisites?

Intermediate Python • Basic LLM API experience • QA fundamentals (advantage)

Which positions does this program prepare me for?

AI QA / Eval Specialist — Design and maintain golden datasets • Set up pairwise and rubric eval with LLM-as-judge • End-to-end eval pipeline with RAGAS, DeepEval, Phoenix, Promptfoo

Why is this program needed in Türkiye?

AI-specific QA discipline is nearly unknown in Türkiye. As SaaS AI products grow, this role becomes critical.

Bring This Program to Your Team

In a free 30-minute discovery call we map your team's capability, explore your target use cases and prepare a custom quote for your company. No commitment.