Autoplay
Autocomplete
Previous Lesson
Complete and Continue
Testing AI Systems with DeepEval: AI Agents, Chatbots & RAG
Section 1: AI Testing Fundamentals: Thinking Beyond Traditional QA
Lecture 2: What exactly is Testing AI? How different it is from Traditional testing (5:20)
Lecture 3: How to validate non Deterministic output AI Systems - Shift in testing mindset (4:30)
Lecture 4: Introduction to DeepEval and its core capabilities for evaluating AI Systems (10:51)
Quiz 1: Quiz - Test your knowledge
Section 2: DeepEval Setup & Building Your First AI Testing Project
Lecture 5: Download Dev and Testing Project codebase discussed in this course
Lecture 6: Install PyCharm and configure AI Agents Project with necessary packages install (9:42)
Lecture 7: Understand the Demo of one of the AI Agents Project used for Testing (14:57)
Lecture 8: Setup the LLM Keys to make Agents work - Claude & OpenAI (5:37)
Section 3: Evaluating AI Agents with DeepEval Metrics & Golden Datasets
Lecture 9: DeepEvals TaskCompletion Metric to evaluate the AI Agent - Blackbox Testing (16:05)
Lecture 10: Understand reading the result reports from Confident AI - Deep evals Integration (11:32)
Lecture 11: Important Notes
Lecture 12: Agents Component Testing overview - Understand internal workflow (12:52)
Lecture 13: Goldens - Data sets to evaluate AI Systems and Importance to evals_Iterator (14:30)
Lecture 14: Implement Tracing with in Test file for smart Agent calling & track workflow (10:51)
Section 4: Testing AI Agent Internals: Traces, Components & Root Cause Analysis
Lecture 16: Understand how DeepEvals reports fail reason if AI Agents are not up to quality (25:03)
Lecture 15: When to use Component test Tracing type ? & When to go for LLMTestCase style (6:13)
Lecture 17: Another Metric example in Component/Trace level to test AI Agents internal flow (10:39)
Section 5: ⚙️ Building Custom AI Evaluation Metrics with G-Eval
Lecture 18: How to build Custom Metrics with DeepEvals - Intro to GEval Class (12:13)
Lecture 19: Merge multiple Metrics into Single Test file with end to end Agent Testing (7:31)
Lecture 20: Demo of building custom Faithfulness Metric using Geval to evaluate the Agents (10:26)
Quiz 2: Quiz - Check your knowledge
Section 6: Testing Multi-Turn Chatbots & Conversational AI Systems
Lecture 21: How to validate Multi Turn AI conversations such as Chatbot etc? - Overview (5:53)
Lecture 22: Build Turns list object to track conversations and evaluate the Chatbot Metrics (14:42)
Lecture 23: End to end demo of Multi conversational Agents validation with Deep evals (5:43)
Lecture 24: Explore other Standard Deep evals Metrics to test Chatbot similar AI Apps (7:43)
Lecture 25: Build Custom Metrics to validate Chatbot AI Systems using GEval - Example demo (9:21)
Quiz 3: Quiz - Test your knowledge
Section 7: Testing RAG Applications: Retrieval Quality & Response Accuracy
Lecture 22: What are RAG Agents? How different they are from Traditional AI Agents? (16:31)
Lecture 27: Get demo of RAG App used for testing & compare it with AI Agent demo app (11:28)
Lecture 28: RAG Metrics - Contextual Precision Metric to validate RAG Agent output quality (12:46)
Lecture 29: Demo example of validating RAG Agents with standard Deep eval Metric methods (15:42)
Lecture 30: RAG Metrics - Contextual Recall Metric to validate RAG Agent retrieval quality (13:09)
Quiz 4: Quiz - Test your knowledge
Section 8: Synthetic Data Generation & AI Safety Testing
Lecture 32: Demo example of Data Generation technique implementation in Deep Evals Tests (11:09)
Lecture 31: What is Synthetic Data Generation and how it helps to generate goldens (6:37)
Lecture 33: Deep Evals Safety Metrics demonstration with data generation capability (11:25)
Lecture 34: Wrap up - What did we learn from this course? Next steps (2:23)
Lecture 35: Resume skills what you can add from from this course
Section 9: Final Exam - Assess your knowledge
Quiz 5: Quiz - MCQ's
Section 10: Optional - Learn Python Fundamentals with examples
Lecture 36: Python hello world Program with Basics (8:35)
Lecture 37: Datatypes in python and how to get the Type at run time (5:17)
Lecture 38: List Datatype and its operations to manipulate (12:47)
Lecture 39: Tuple and Dictionary Data types in Python with examples (8:28)
Lecture 40: If else condition in python with working examples (3:10)
Lecture 41: How to Create Dictionaries at run time and add data into it (7:55)
Lecture 42: How loops work in Python and importance of code idendation (8:58)
Lecture 43: Programming examples using for loop - 1 (4:17)
Lecture 44: Programming examples using While loop - 2 (10:28)
Lecture 45: What are functions? How to use them in Python (10:46)
Lecture 46: OOPS Principles : Classes and objects in Python (7:38)
Lecture 47: What is Constructor and its role in Object oriented programming (13:38)
Lecture 48: Inheritance concepts with examples in Python (12:12)
Lecture 49: Strings and its functions in python (9:53)
Section 11: Bonus Lecture
Lecture 50: Bonus Lecture
Section 3: Getting started with Practice LLM's and the approach to evaluate /Test
Lecture 9: Course resources download
Lecture 10: Demo of Practice RAG LLM's to evaluate and write test automation scripts (6:51)
Lecture 11: Understanding implementation part of practice RAG LLM's to understand context (8:36)
Lecture 12: Understand conversational LLM scenarios and how they are applied to RAG Arch (5:47)
Lecture 13: Understand the Metric benchmarks for Document Retrieval system in LLM (8:12)
Section 4: Setup Python & Pytest Environment with RAGAS LLM Evaluation Package Libraries
Lecture 14: Install and set the path of Python in windows OS (10:16)
Lecture 15: Install and set the path of Python in MAC OS (10:26)
Lecture 16: Install RAGAS Framework packages and setup the LLM Test project (9:35)
Lecture 17: Python & Pytest Basics - Where to find them in the tutorial?
Section 5: Programmatic solution to evaluate LLM Metrics with Langchain and RAGAS Libraries
Lecture 18: Making connection with OpenAI using Langchain Framework for RAGAS (15:49)
Lecture 19: End to end -Evaluate LLM for ContextPrecision metric with SingleTurn Test data (20:38)
Lecture 20: Metrics document download
Lecture 21: Communicate with LLM's using API Post call to dynamically get responses (9:51)
Lecture 22: Evaluate LLM for Context Recall Metric with RAGAS Pytest Test example (13:22)
Section 6: Optimize LLM Evaluation tests with Pytest Fixtures & Parameterization techniques
Lecture 23: Build Pytest fixtures to isolate OpenAI and LLM Wrapper common utils from test (7:56)
Lecture 24: Introduction to Pytest Parameterization fixtures to drive test data externally (10:13)
Lecture 25: Reusable utils to isolate API calls of LLM and have test only on Metric logic (13:19)
Section 7: Evaluate LLM Core Metrics and importance of EvalDataSet in RAGAS Framework
Lecture 26: Understand LLM's Faithfulness and Response relevance metrics conceptually (4:56)
Lecture 27: Build LLM Evaluation script to test Faithfulness benchmarks using RAGAS (9:42)
Lecture 28: Reading Test data from external json file to LLM evaluation scripts (9:58)
Lecture 29: Understand how Metrics are used at different places of RAG LLM Architecture (10:34)
Lecture 30: Factual Correctness - Build a single Test to evaluate multiple LLM metrics (12:02)
Section 8: Upload LLM Evaluation results & Test LLM for Multi Conversational Chat History
Lecture 31: Understand EvaluationDataSet and how it help in evaluating Multiple metrics (9:41)
Lecture 32: Upload the LLM Metrics evaluation results into RAGAS dashboard portal visually Lesson (8:22)
Lecture 33: How to evaluate RAG LLM with multi conversational history chat (7:59)
Lecture 34: Build LLM Evaluation Test which can evaluate multi conversation - example (17:42)
Section 9: Create Test Data dynamically to evaluate LLM & Generate Rubrics Evaluation Score
Lecture 35: How to Create Test Data using RAGAS Framework to evaluate LLM (15:02)
Lecture 36: Load the external docs into Langchain utils to analyze and extract test data (8:52)
Lecture 37: Install and configure NLTK package to scan the LLM documents & generating tests (20:11)
Lecture 38: Generate Rubrics based Criteria Scoring to evaluate the quality of LLM responses (11:46)
Section 10: Conclusion and next steps!
Lecture 39: 1 slide Recap of concepts learned from the course (4:29)
Lecture 40: Bonus Lecture
Section 11: Optional - Learn Python Fundamentals with examples
Lecture 41: Python hello world Program with Basics (8:35)
Lecture 42: Datatypes in python and how to get the Type at run time (5:17)
Lecture 43: List Datatype and its operations to manipulate (12:47)
Lecture 44: Tuple and Dictionary Data types in Python with examples (8:28)
Lecture 45: If else condition in python with working examples (3:10)
Lecture 46: How to Create Dictionaries at run time and add data into it (7:55)
Lecture 47: How loops work in Python and importance of code idendation (8:58)
Lecture 48: Programming examples using for loop - 1 (4:17)
Lecture 51: OOPS Principles : Classes and objects in Python (7:38)
Lecture 49: Programming examples using While loop - 2 (10:28)
Lecture 50: What are functions? How to use them in Python (10:46)
Lecture 52: What is Constructor and its role in Object oriented programming (13:38)
Lecture 53: Inheritance concepts with examples in Python (12:12)
Lecture 54: Strings and its functions in python (9:53)
Section 12: Optional - Overview of Pytest Framework basics with examples
Lecture 55: What are pytest fixtures and how it help in enhancing tests (10:29)
Lecture 56: Understand scopes in Pytest fixtures with examples (11:59)
Lecture 57: Setup and teardown setup using Python fixtures with yield keyword (9:04)
Lecture 42: How loops work in Python and importance of code idendation
Lesson content locked
If you're already enrolled,
you'll need to login
.
Enroll in Course to Unlock