Course Description
LLMs are everywhere! Every business is building its own custom AI-based RAG-LLMs to improve customer service.
But how are engineers testing them? Unlike traditional software testing, AI-based systems need a special methodology for evaluation.
This course starts from the ground up, explaining the architecture of how AI systems (LLMs) work behind the scenes.
Then, it dives deep into LLM evaluation metrics.
This course shows you how to effectively use the RAGAS framework library to evaluate LLM metrics through scripted examples.
This allows you to use Pytest assertions to check metric benchmark scores and design a robust LLM Test/evaluation automation framework.
What will you learn from the course? High level overview on Large Language Models (LLM) Understand how Custom LLM’s are built using Retrieval Augmented Generation (RAG) Architecture Common Benchmarks/Metrics used in Evaluating RAG based LLM’s
Introduction to RAGAS Evaluation framework for evaluating/test LLM’s.
Course Curriculum
Section 1: AI Testing Fundamentals: Thinking Beyond Traditional QA
Available in
days
days
after you enroll
- Lecture 2: What exactly is Testing AI? How different it is from Traditional testing (5:20)
- Lecture 3: How to validate non Deterministic output AI Systems - Shift in testing mindset (4:30)
- Lecture 4: Introduction to DeepEval and its core capabilities for evaluating AI Systems (10:51)
- Quiz 1: Quiz - Test your knowledge
Section 2: DeepEval Setup & Building Your First AI Testing Project
Available in
days
days
after you enroll
- Lecture 5: Download Dev and Testing Project codebase discussed in this course
- Lecture 6: Install PyCharm and configure AI Agents Project with necessary packages install (9:42)
- Lecture 7: Understand the Demo of one of the AI Agents Project used for Testing (14:57)
- Lecture 8: Setup the LLM Keys to make Agents work - Claude & OpenAI (5:37)
Section 3: Evaluating AI Agents with DeepEval Metrics & Golden Datasets
Available in
days
days
after you enroll
- Lecture 9: DeepEvals TaskCompletion Metric to evaluate the AI Agent - Blackbox Testing (16:05)
- Lecture 10: Understand reading the result reports from Confident AI - Deep evals Integration (11:32)
- Lecture 11: Important Notes
- Lecture 12: Agents Component Testing overview - Understand internal workflow (12:52)
- Lecture 13: Goldens - Data sets to evaluate AI Systems and Importance to evals_Iterator (14:30)
- Lecture 14: Implement Tracing with in Test file for smart Agent calling & track workflow (10:51)
Section 4: Testing AI Agent Internals: Traces, Components & Root Cause Analysis
Available in
days
days
after you enroll
Section 5: ⚙️ Building Custom AI Evaluation Metrics with G-Eval
Available in
days
days
after you enroll
- Lecture 18: How to build Custom Metrics with DeepEvals - Intro to GEval Class (12:13)
- Lecture 19: Merge multiple Metrics into Single Test file with end to end Agent Testing (7:31)
- Lecture 20: Demo of building custom Faithfulness Metric using Geval to evaluate the Agents (10:26)
- Quiz 2: Quiz - Check your knowledge
Section 6: Testing Multi-Turn Chatbots & Conversational AI Systems
Available in
days
days
after you enroll
- Lecture 21: How to validate Multi Turn AI conversations such as Chatbot etc? - Overview (5:53)
- Lecture 22: Build Turns list object to track conversations and evaluate the Chatbot Metrics (14:42)
- Lecture 23: End to end demo of Multi conversational Agents validation with Deep evals (5:43)
- Lecture 24: Explore other Standard Deep evals Metrics to test Chatbot similar AI Apps (7:43)
- Lecture 25: Build Custom Metrics to validate Chatbot AI Systems using GEval - Example demo (9:21)
- Quiz 3: Quiz - Test your knowledge
Section 7: Testing RAG Applications: Retrieval Quality & Response Accuracy
Available in
days
days
after you enroll
- Lecture 22: What are RAG Agents? How different they are from Traditional AI Agents? (16:31)
- Lecture 27: Get demo of RAG App used for testing & compare it with AI Agent demo app (11:28)
- Lecture 28: RAG Metrics - Contextual Precision Metric to validate RAG Agent output quality (12:46)
- Lecture 29: Demo example of validating RAG Agents with standard Deep eval Metric methods (15:42)
- Lecture 30: RAG Metrics - Contextual Recall Metric to validate RAG Agent retrieval quality (13:09)
- Quiz 4: Quiz - Test your knowledge
Section 8: Synthetic Data Generation & AI Safety Testing
Available in
days
days
after you enroll
- Lecture 32: Demo example of Data Generation technique implementation in Deep Evals Tests (11:09)
- Lecture 31: What is Synthetic Data Generation and how it helps to generate goldens (6:37)
- Lecture 33: Deep Evals Safety Metrics demonstration with data generation capability (11:25)
- Lecture 34: Wrap up - What did we learn from this course? Next steps (2:23)
- Lecture 35: Resume skills what you can add from from this course
Section 9: Final Exam - Assess your knowledge
Available in
days
days
after you enroll
Section 10: Optional - Learn Python Fundamentals with examples
Available in
days
days
after you enroll
- Lecture 36: Python hello world Program with Basics (8:35)
- Lecture 37: Datatypes in python and how to get the Type at run time (5:17)
- Lecture 38: List Datatype and its operations to manipulate (12:47)
- Lecture 39: Tuple and Dictionary Data types in Python with examples (8:28)
- Lecture 40: If else condition in python with working examples (3:10)
- Lecture 41: How to Create Dictionaries at run time and add data into it (7:55)
- Lecture 42: How loops work in Python and importance of code idendation (8:58)
- Lecture 43: Programming examples using for loop - 1 (4:17)
- Lecture 44: Programming examples using While loop - 2 (10:28)
- Lecture 45: What are functions? How to use them in Python (10:46)
- Lecture 46: OOPS Principles : Classes and objects in Python (7:38)
- Lecture 47: What is Constructor and its role in Object oriented programming (13:38)
- Lecture 48: Inheritance concepts with examples in Python (12:12)
- Lecture 49: Strings and its functions in python (9:53)
Section 11: Bonus Lecture
Available in
days
days
after you enroll
Section 3: Getting started with Practice LLM's and the approach to evaluate /Test
Available in
days
days
after you enroll
- Lecture 9: Course resources download
- Lecture 10: Demo of Practice RAG LLM's to evaluate and write test automation scripts (6:51)
- Lecture 11: Understanding implementation part of practice RAG LLM's to understand context (8:36)
- Lecture 12: Understand conversational LLM scenarios and how they are applied to RAG Arch (5:47)
- Lecture 13: Understand the Metric benchmarks for Document Retrieval system in LLM (8:12)
Section 4: Setup Python & Pytest Environment with RAGAS LLM Evaluation Package Libraries
Available in
days
days
after you enroll
Section 5: Programmatic solution to evaluate LLM Metrics with Langchain and RAGAS Libraries
Available in
days
days
after you enroll
- Lecture 18: Making connection with OpenAI using Langchain Framework for RAGAS (15:49)
- Lecture 19: End to end -Evaluate LLM for ContextPrecision metric with SingleTurn Test data (20:38)
- Lecture 20: Metrics document download
- Lecture 21: Communicate with LLM's using API Post call to dynamically get responses (9:51)
- Lecture 22: Evaluate LLM for Context Recall Metric with RAGAS Pytest Test example (13:22)
Section 6: Optimize LLM Evaluation tests with Pytest Fixtures & Parameterization techniques
Available in
days
days
after you enroll
Section 7: Evaluate LLM Core Metrics and importance of EvalDataSet in RAGAS Framework
Available in
days
days
after you enroll
- Lecture 26: Understand LLM's Faithfulness and Response relevance metrics conceptually (4:56)
- Lecture 27: Build LLM Evaluation script to test Faithfulness benchmarks using RAGAS (9:42)
- Lecture 28: Reading Test data from external json file to LLM evaluation scripts (9:58)
- Lecture 29: Understand how Metrics are used at different places of RAG LLM Architecture (10:34)
- Lecture 30: Factual Correctness - Build a single Test to evaluate multiple LLM metrics (12:02)
Section 8: Upload LLM Evaluation results & Test LLM for Multi Conversational Chat History
Available in
days
days
after you enroll
- Lecture 31: Understand EvaluationDataSet and how it help in evaluating Multiple metrics (9:41)
- Lecture 32: Upload the LLM Metrics evaluation results into RAGAS dashboard portal visually Lesson (8:22)
- Lecture 33: How to evaluate RAG LLM with multi conversational history chat (7:59)
- Lecture 34: Build LLM Evaluation Test which can evaluate multi conversation - example (17:42)
Section 9: Create Test Data dynamically to evaluate LLM & Generate Rubrics Evaluation Score
Available in
days
days
after you enroll
- Lecture 35: How to Create Test Data using RAGAS Framework to evaluate LLM (15:02)
- Lecture 36: Load the external docs into Langchain utils to analyze and extract test data (8:52)
- Lecture 37: Install and configure NLTK package to scan the LLM documents & generating tests (20:11)
- Lecture 38: Generate Rubrics based Criteria Scoring to evaluate the quality of LLM responses (11:46)
Section 10: Conclusion and next steps!
Available in
days
days
after you enroll
Section 11: Optional - Learn Python Fundamentals with examples
Available in
days
days
after you enroll
- Lecture 41: Python hello world Program with Basics (8:35)
- Lecture 42: Datatypes in python and how to get the Type at run time (5:17)
- Lecture 43: List Datatype and its operations to manipulate (12:47)
- Lecture 44: Tuple and Dictionary Data types in Python with examples (8:28)
- Lecture 45: If else condition in python with working examples (3:10)
- Lecture 46: How to Create Dictionaries at run time and add data into it (7:55)
- Lecture 47: How loops work in Python and importance of code idendation (8:58)
- Lecture 48: Programming examples using for loop - 1 (4:17)
- Lecture 51: OOPS Principles : Classes and objects in Python (7:38)
- Lecture 49: Programming examples using While loop - 2 (10:28)
- Lecture 50: What are functions? How to use them in Python (10:46)
- Lecture 52: What is Constructor and its role in Object oriented programming (13:38)
- Lecture 53: Inheritance concepts with examples in Python (12:12)
- Lecture 54: Strings and its functions in python (9:53)
Section 12: Optional - Overview of Pytest Framework basics with examples
Available in
days
days
after you enroll