Skip to content

About this training

A 3-day advanced program for AI engineers who want to build enterprise knowledge bases and production-grade RAG systems. Comparatively addresses LlamaIndex's data-first paradigm and 5 main vector DBs, extending from Knowledge Graph + Property Graph indices to multi-modal RAG. Includes LlamaParse, advanced retrieval, and RAGAS/TruLens eval.

This training is designed for: AI engineers and data engineers who want to build production-grade enterprise knowledge bases and RAG systems Developers who know LangChain basics but want to deepen their expertise in LlamaIndex's data-first paradigm Platform Engineer and ML Platform teams who need to make architectural decisions among Pinecone, Chroma, Weaviate, Qdrant Technical teams of structured-data-rich sectors like finance, healthcare, and law who want to build a Knowledge Graph + RAG hybrid architecture Teams who want to query insurance, e-commerce, and engineering documents with multi-modal RAG (image, table, video) Startup CTOs and technical founders looking to build KVKK-compliant multi-tenant enterprise RAG SaaS

Why this course matters: Positioned as the sole reference program with its data-first paradigm and enterprise knowledge-base focus, while comprehensive LlamaIndex-specific training is virtually nonexistent in Turkey. Provides architectural-decision maturity by comparing the 5 main vector DBs (Pinecone, Chroma, Weaviate, Qdrant, pgvector) head-to-head across HNSW/IVF, sharding, cost, and KVKK dimensions. Positions production-grade parsing with LlamaParse as a clear advantage over PyPDF / Unstructured / AWS Textract and demonstrates it hands-on. Structured RAG with Knowledge Graph + Property Graph Index — LlamaIndex's unique differentiator — is addressed end to end on a topic that has almost no Turkish-language resources. Imparts techniques that dramatically improve basic RAG through advanced retrieval patterns (reranking, recursive, auto-retrieval, sentence-window). Establishes a production-discipline framework comprehensively with LlamaIndex Workflows, multi-modal RAG, and the RAGAS/TruLens evaluation framework.

Learning outcomes by the end of the programme: Manage the LlamaIndex ecosystem (Core, Cloud, Parse, Hub, Workflows) in an integrated way. Make architectural decisions among Pinecone, Chroma, Weaviate, Qdrant, pgvector. Perform production-grade complex PDF, DOCX, and XLSX parsing with LlamaParse. Use index types like VectorStoreIndex, SummaryIndex, TreeIndex, KG Index correctly per scenario. Optimize precision/recall with hybrid retrieval, reranking, recursive, and auto-retrieval. Build structured RAG systems with Knowledge Graph + Property Graph Index. Design event-driven agentic RAG architecture with LlamaIndex Workflows. Query images, tables, videos, and PDFs with multi-modal RAG. Set up production-grade RAG evaluation with RAGAS, TruLens, and native evaluators.

Prerequisites and recommended background: Active Python experience (intermediate to advanced), use of async/await and type hints Basic experience with a vector DB or search engine (recommended) Basic knowledge of REST APIs and JSON Schema Git, terminal, and modern IDE experience Access to OpenAI, Anthropic, or a self-hosted model before the training Basic knowledge of cloud, container, or FastAPI deployment (recommended)

  • Turkey's only comprehensive LlamaIndex training that addresses the LlamaIndex Core, LlamaCloud, LlamaParse, LlamaHub, and LlamaIndex Workflows ecosystem in an integrated way
  • Comprehensive head-to-head comparison of the 5 main vector DBs (Pinecone, Chroma, Weaviate, Qdrant, pgvector) covering HNSW/IVF/ScaNN indexing, sharding, and cost trade-offs
  • A module that demonstrates LlamaParse's clear superiority over PyPDF / Unstructured / AWS Textract in production-grade table extraction, OCR, and formula parsing
  • A unique focus on structured RAG with Knowledge Graph + Property Graph Index, including Neo4j / FalkorDB / Nebula backend integrations
  • Techniques to improve basic RAG by 30–50% via advanced retrieval patterns (reranking, recursive, auto-retrieval, sentence-window, parent-document)
  • Production discipline with event-driven agentic RAG via LlamaIndex Workflows, multi-modal RAG, and the RAGAS/TruLens evaluation framework

Key Takeaways

  1. Manage the LlamaIndex ecosystem (Core, Cloud, Parse, Hub, Workflows) in an integrated way.
  2. Make architectural decisions among Pinecone, Chroma, Weaviate, Qdrant, pgvector.
  3. Perform production-grade complex PDF, DOCX, and XLSX parsing with LlamaParse.
  4. Use index types like VectorStoreIndex, SummaryIndex, TreeIndex, KG Index correctly per scenario.
  5. Optimize precision/recall with hybrid retrieval, reranking, recursive, and auto-retrieval.
  6. Build structured RAG systems with Knowledge Graph + Property Graph Index.
  7. Design event-driven agentic RAG architecture with LlamaIndex Workflows.
  8. Query images, tables, videos, and PDFs with multi-modal RAG.
  9. Set up production-grade RAG evaluation with RAGAS, TruLens, and native evaluators.
Hero Background
Advanced Level3 Gün

RAG Training with LlamaIndex and Vector DBs (Pinecone, Chroma, Weaviate, Qdrant)

A 3-day advanced program for AI engineers who want to build enterprise knowledge bases and production-grade RAG systems. Comparatively addresses LlamaIndex's data-first paradigm and 5 main vector DBs, extending from Knowledge Graph + Property Graph indices to multi-modal RAG. Includes LlamaParse, advanced retrieval, and RAGAS/TruLens eval.

About This Course

This training is designed for AI engineers, data engineers, ML engineers, knowledge management architects, and platform developers who want to build production-grade enterprise knowledge bases and RAG systems using LlamaIndex with its data-first paradigm. At the heart of the program is the following approach: learning LlamaIndex is not simply 'putting a PDF into a vector DB and running top-k retrieval.' Real engineering value comes from production-grade document parsing with LlamaParse, the right chunk size and metadata enrichment, sharding/replication/scaling decisions in vector DB selection, precision/recall optimization with advanced retrieval patterns (hybrid, recursive, auto-retrieval, reranking), structured retrieval with Knowledge Graph + Property Graph Index, event-driven Workflows architecture for agentic RAG, image-text alignment for multi-modal RAG, and KVKK-compliant multi-tenant production deployment.



The LlamaIndex ecosystem has matured rapidly over the past three years and as of 2026 is structured around five main products: LlamaIndex Core (index taxonomy, retriever, query engine, response synthesizer), LlamaCloud (managed parsing + indexing), LlamaParse (production-grade document parsing), LlamaHub (100+ connectors), LlamaIndex Workflows (event-driven multi-step pipelines), and LlamaIndex Agents (FunctionAgent, ReActAgent, AgentWorkflow). This training addresses these five products not as separate silos but as an integrated data engineering + RAG framework. Comprehensive LlamaIndex-specific training in Turkey is virtually nonexistent; LangChain courses exist, but distinguishing features like LlamaIndex's data-first paradigm, LlamaParse production parsing, Knowledge Graph Index, and LlamaIndex Workflows are generally not covered. This program is designed to fill that gap as Turkey's most comprehensive LlamaIndex + Vector DB reference training.



A strategic dimension of the program is positioning LlamaIndex's place in the agentic AI ecosystem by comparing it with other frameworks and approaches. The comparison with LangChain is particularly important: while LangChain is a general-purpose application-first framework, LlamaIndex is a data-first-focused tool optimized for very large corpora (1M+ documents). The index taxonomy (VectorStoreIndex, SummaryIndex, TreeIndex, KnowledgeGraphIndex, DocumentSummaryIndex, CompositeIndex) does not exist in LangChain; this is LlamaIndex's unique architectural differentiator. Comparisons with Haystack and raw LLM SDKs are also made; the strengths and weaknesses of each approach and which project type calls for which are analyzed in detail.



The backbone of the program is the vector DB comparison module. The 5 leading vector DBs of 2026 — Pinecone (serverless managed), Chroma (embedded + client-server), Weaviate (open-source + hybrid + native multi-modal), Qdrant (open-source + performance focused), pgvector (Postgres native) — are compared head-to-head. Indexing algorithms like HNSW, IVF, and ScaNN; sharding, replication, and scaling characteristics; managed vs self-hosted cost trade-offs; KVKK-compliant self-hosted vs cloud decisions for Turkey; namespace and metadata-filtering structures are addressed in detail. Alternatives like Milvus, LanceDB, and MongoDB Atlas Vector Search are also included in the decision matrix. This is the only training in Turkey that performs vector DB comparison at this depth.



The LlamaParse module is one of the strongest differentiators of the LlamaIndex ecosystem. It is compared head-to-head with alternatives like PyPDF, Unstructured.io, and AWS Textract; LlamaParse's superiority in complex table extraction, image OCR, formula parsing, and markdown output is shown through hands-on exercises. In production scenarios — insurance policies, financial reports, academic papers, engineering documents — the right parameter selection and cost-effective usage of LlamaParse are addressed in detail. The nuances of Turkish document parsing, morphology-aware chunking strategies, and token counting are also part of this module.



The index taxonomy module reveals LlamaIndex's unique architectural distinction. VectorStoreIndex (classic semantic search), SummaryIndex (full-document iteration), TreeIndex (hierarchical recursive summarization), KeywordTableIndex (keyword-based filtering), DocumentSummaryIndex (hybrid retrieval), and CompositeIndex (multi-index orchestration) are addressed in detail. Dynamic index selection via Router Query Engine, the multi-document Doc Agent pattern, the Storage Context (DocStore, IndexStore, VectorStore) layers, and persistent-index + incremental-update strategies are covered comprehensively. The discipline of orchestrating multiple indices provides architectural maturity beyond the classic single-vector-DB approach.



In the retriever and query engine module, top-k retrieval, hybrid (sparse + dense), reciprocal rank fusion with QueryFusionRetriever, Auto Merging Retriever, query decomposition with SubQuestion QueryEngine, and Router Query Engine are addressed. The Response Synthesizer's refine, compact, tree_summarize, and accumulate modes; streaming response; and production topics like token-by-token output are shown hands-on. In the following advanced retrieval module, cross-encoder rerankers (bge-reranker-v2, Cohere Rerank, Voyage Rerank), LLM-as-reranker, recursive retrieval (chunk → parent document), auto-retrieval (LLM-generated metadata filters), and sentence-window and parent-document patterns are addressed. This discipline improves the retrieval quality of basic RAG by 30–50%.



Perhaps the most distinguishing module of the program is dedicated to Knowledge Graph + Property Graph Index. LlamaIndex is the only mainstream RAG framework that performs automatic entity-relation extraction from documents and builds schema-aware knowledge bases. This training covers in detail: automatic entity and relation extraction; schema-free vs schema-aware KG setup; hybrid KG + vector retrieval with KGTableRetriever; rich schema-aware knowledge bases with the Property Graph Index; the GraphRAG approach and comparison with Microsoft GraphRAG; and graph backend integrations like Neo4j / FalkorDB / Nebula. This discipline dramatically improves RAG quality especially in structured-data-rich sectors like finance, healthcare, and law.



The LlamaIndex Workflows module teaches the framework's agentic RAG paradigm. Event-driven pipeline design with the @step decorator, context / branch / join / loop patterns, streaming workflows, and mid-flight observability are addressed. In agent types, FunctionAgent (structured tool use), ReActAgent (thought-action-observation loop), and AgentWorkflow (multi-agent orchestration) are addressed in detail. As agentic RAG patterns, query router agent, dynamic index selection, self-correcting RAG, and replan-on-failure mechanics are shown.



The multi-modal RAG module addresses LlamaIndex's mature multi-modal capabilities as of 2026. MultiModalVectorStoreIndex and image-text alignment, CLIP / ImageBind / Voyage multi-modal embedding models, the comparison of GPT-5 Vision / Claude Opus 4.7 Vision / Gemini 2.5 Pro Vision, recursive node parsing with LlamaParse table extraction, nested tables, the table-as-image fallback approach, Whisper transcript + LlamaIndex pipeline, scene detection, and timestamp-aware retrieval topics are addressed hands-on. Multi-modal RAG is directly applicable to real production scenarios like insurance claims, e-commerce product catalogs, engineering drawings, and medical imaging.



The evaluation module represents the production-discipline dimension of the training. RAGAS framework metrics (faithfulness, answer relevancy, context recall, context precision); synthetic test sets with RAGAS testset.synthesize; TruLens feedback functions and trace anatomy; the TruLens dashboard and A/B comparison; LlamaIndex's native CorrectnessEvaluator, FaithfulnessEvaluator, RelevancyEvaluator; and regression-test pipelines with BatchEvalRunner are addressed in detail. In the production deployment module, FastAPI + LlamaIndex endpoints, LlamaCloud managed deployment, comparison of Vercel / AWS / GCP / Kubernetes, tenant-aware indexing with namespace isolation, tenant access control via vector DB metadata filtering, KVKK-compliant multi-tenant strategy, embedding caching, query cache, semantic cache, and model routing (GPT-5 + Sonnet + Haiku + Local hybrid) are covered end to end.



In the capstone project, each participant designs an end-to-end production-grade enterprise knowledge base and RAG system for their own company: a LlamaParse → chunking → embedding → vector DB → retrieval → eval → deployment pipeline; vector DB selection and multi-tenant deployment topology; a Knowledge Graph + vector hybrid architecture; an eval report and cost projection. By the end of the training, participants reach a level of technical and architectural competence to manage the LlamaIndex ecosystem in an integrated way within the data-first paradigm, make architectural decisions among the 5 main vector DBs, perform production-grade parsing with LlamaParse, optimize precision/recall with advanced retrieval patterns, build structured RAG with Knowledge Graph + Property Graph Index, design agentic RAG architecture with LlamaIndex Workflows, perform multi-modal RAG, set up production evaluation with RAGAS/TruLens, and perform KVKK-compliant multi-tenant deployment. The training consists of 3 days, 12 modules, and over 80 hands-on lessons.

Training Methodology

Turkey's only comprehensive LlamaIndex training that addresses the LlamaIndex Core, LlamaCloud, LlamaParse, LlamaHub, and LlamaIndex Workflows ecosystem in an integrated way

Comprehensive head-to-head comparison of the 5 main vector DBs (Pinecone, Chroma, Weaviate, Qdrant, pgvector) covering HNSW/IVF/ScaNN indexing, sharding, and cost trade-offs

A module that demonstrates LlamaParse's clear superiority over PyPDF / Unstructured / AWS Textract in production-grade table extraction, OCR, and formula parsing

A unique focus on structured RAG with Knowledge Graph + Property Graph Index, including Neo4j / FalkorDB / Nebula backend integrations

Techniques to improve basic RAG by 30–50% via advanced retrieval patterns (reranking, recursive, auto-retrieval, sentence-window, parent-document)

Production discipline with event-driven agentic RAG via LlamaIndex Workflows, multi-modal RAG, and the RAGAS/TruLens evaluation framework

Who Is This For?

AI engineers and data engineers who want to build production-grade enterprise knowledge bases and RAG systems
Developers who know LangChain basics but want to deepen their expertise in LlamaIndex's data-first paradigm
Platform Engineer and ML Platform teams who need to make architectural decisions among Pinecone, Chroma, Weaviate, Qdrant
Technical teams of structured-data-rich sectors like finance, healthcare, and law who want to build a Knowledge Graph + RAG hybrid architecture
Teams who want to query insurance, e-commerce, and engineering documents with multi-modal RAG (image, table, video)
Startup CTOs and technical founders looking to build KVKK-compliant multi-tenant enterprise RAG SaaS

Why This Course?

1

Positioned as the sole reference program with its data-first paradigm and enterprise knowledge-base focus, while comprehensive LlamaIndex-specific training is virtually nonexistent in Turkey.

2

Provides architectural-decision maturity by comparing the 5 main vector DBs (Pinecone, Chroma, Weaviate, Qdrant, pgvector) head-to-head across HNSW/IVF, sharding, cost, and KVKK dimensions.

3

Positions production-grade parsing with LlamaParse as a clear advantage over PyPDF / Unstructured / AWS Textract and demonstrates it hands-on.

4

Structured RAG with Knowledge Graph + Property Graph Index — LlamaIndex's unique differentiator — is addressed end to end on a topic that has almost no Turkish-language resources.

5

Imparts techniques that dramatically improve basic RAG through advanced retrieval patterns (reranking, recursive, auto-retrieval, sentence-window).

6

Establishes a production-discipline framework comprehensively with LlamaIndex Workflows, multi-modal RAG, and the RAGAS/TruLens evaluation framework.

Learning Outcomes

Manage the LlamaIndex ecosystem (Core, Cloud, Parse, Hub, Workflows) in an integrated way.
Make architectural decisions among Pinecone, Chroma, Weaviate, Qdrant, pgvector.
Perform production-grade complex PDF, DOCX, and XLSX parsing with LlamaParse.
Use index types like VectorStoreIndex, SummaryIndex, TreeIndex, KG Index correctly per scenario.
Optimize precision/recall with hybrid retrieval, reranking, recursive, and auto-retrieval.
Build structured RAG systems with Knowledge Graph + Property Graph Index.
Design event-driven agentic RAG architecture with LlamaIndex Workflows.
Query images, tables, videos, and PDFs with multi-modal RAG.
Set up production-grade RAG evaluation with RAGAS, TruLens, and native evaluators.

Requirements

Active Python experience (intermediate to advanced), use of async/await and type hints
Basic experience with a vector DB or search engine (recommended)
Basic knowledge of REST APIs and JSON Schema
Git, terminal, and modern IDE experience
Access to OpenAI, Anthropic, or a self-hosted model before the training
Basic knowledge of cloud, container, or FastAPI deployment (recommended)

Course Curriculum

94 Lessons
01
Module 1: Strategic Introduction to the LlamaIndex Ecosystem and the Data-First Paradigm8 Lessons
02
Module 2: Document Ingestion and Production Parsing with LlamaParse9 Lessons
03
Module 3: Vector DB Architecture and Comparison — Pinecone, Chroma, Weaviate, Qdrant, pgvector9 Lessons
04
Module 4: Index Types and Composability — VectorStoreIndex, SummaryIndex, TreeIndex9 Lessons
05
Module 5: Retriever and Query Engine Strategies8 Lessons
06
Module 6: Advanced Retrieval Patterns — Reranking, Recursive, Auto-Retrieval8 Lessons
07
Module 7: Structured Retrieval with Knowledge Graph + Property Graph Index8 Lessons
08
Module 8: LlamaIndex Workflows and Agentic RAG8 Lessons
09
Module 9: Multi-Modal RAG — Image, Table, Video, and PDF8 Lessons
10
Module 10: Evaluation — RAGAS, TruLens, and LlamaIndex Eval6 Lessons
11
Module 11: Production Deployment, Multi-Tenant Architecture, and Cost Optimization9 Lessons
12
Module 12: Capstone — Enterprise Knowledge Base + RAG System4 Lessons

Instructor

Şükrü Yusuf KAYA

Şükrü Yusuf KAYA

AI Architect | Enterprise AI & LLM Training | Stanford University | Software & Technology Consultant

Şükrü Yusuf KAYA is an internationally experienced AI Consultant and Technology Strategist leading the integration of artificial intelligence technologies into the global business landscape. With operations spanning 6 different countries, he bridges the gap between the theoretical boundaries of technology and practical business needs, overseeing end-to-end AI projects in data-critical sectors such as banking, e-commerce, retail, and logistics. Deepening his technical expertise particularly in Generative AI and Large Language Models (LLMs), KAYA ensures that organizations build architectures that shape the future rather than relying on short-term solutions. His visionary approach to transforming complex algorithms and advanced systems into tangible business value aligned with corporate growth targets has positioned him as a sought-after solution partner in the industry. Distinguished by his role as an instructor alongside his consulting and project management career, Şükrü Yusuf KAYA is driven by the motto of "Making AI accessible and applicable for everyone." Through comprehensive training programs designed for a wide spectrum of professionals—from technical teams to C-level executives—he prioritizes increasing organizational AI literacy and establishing a sustainable culture of technological transformation.

Frequently Asked Questions