Kaggle x Google 5-Day AI Agents Intensive Course - Complete Guide

Kaggle x Google AI Agents

📖 Overview

This is the complete, comprehensive guide to Kaggle’s 5-Day AI Agents Intensive Course hosted by Google. This guide combines all course materials from Day 1 through Day 5, covering everything from foundational agent architecture to production deployment.

The course takes learners from fundamental concepts to advanced production implementations, covering everything from basic agent architecture to enterprise-scale deployment strategies. Each day builds progressively on previous knowledge, culminating in production-ready agent systems.

Course Homepage: https://www.kaggle.com/learn-guide/5-day-agents Community: Join Discord at http://discord.gg/kaggle

🗺️ Table of Contents

Course Structure

  1. Day 1: Introduction to Agents - Architecture, taxonomy, and paradigm shift
  2. Day 2: Agent Tools & Interoperability with MCP - Tool design and Model Context Protocol
  3. Day 3: Context Engineering: Sessions & Memory - Short-term and long-term memory systems
  4. Day 4: Agent Quality - Observability, evaluation, and production metrics
  5. Day 5: Prototype to Production - Deployment, scaling, and AgentOps

Additional Resources


Day 1: Introduction to Agents

Watch Day 1

Video URL: https://www.youtube.com/watch?v=ZaUcqznlhv8 Duration: ~2 hours

📖 Description

This is the first day of Kaggle’s 5-day intensive course on AI agents, hosted by Google. The live Q&A session features industry experts discussing the fundamentals of AI agents, agentic architectures, and Google’s Agent Development Kit (ADK). The course is designed to take learners from basic concepts to advanced agent implementations, covering topics like memory, evaluation, and production deployment.

🎯 Learning Objectives

By the end of this video, you will understand:

  • What defines an AI agent and its core architecture (model, tools, orchestration)
  • The fundamental think-act-observe loop in agentic systems
  • The taxonomy of agentic systems from level 0 (pure reasoning) to level 4 (self-evolving)
  • How multi-agent systems collaborate to solve complex workflows
  • Key design principles behind Google’s Agent Development Kit (ADK)
  • The shift from traditional coding to directing autonomous agents
  • Real-world applications of agents in enterprise settings (DoorDash, pharmaceutical research)
  • The importance of interoperability, MCP protocols, and A2A communication
  • How agents are transforming developer workflows and creating citizen developers
  • Evaluation strategies and metrics for agentic systems

📋 Curriculum/Contents

Introduction & Overview

  • Welcome and course structure (Kalpana Parlola & Anand Nalgria)
  • Course design: basics to advanced over 5 days
  • Community engagement and moderator introduction

White Paper Overview: “Introduction to Agents and Agentic Architectures”

  • Evolution from first agents white paper (1 year ago)
  • Core architecture: Model (brain) + Tools (hands) + Orchestration (nervous system)
  • Think-act-observe loop explained
  • Taxonomy of agentic systems (Level 0-4)
  • Multi-agent systems and collaboration patterns
  • Interoperability: A2A protocol, agent-to-human communication
  • Security: Agent identity and governance

Industry Expert Panel Q&A

Paradigm Shift: From Brick Layers to Directors (Mike Clark - Google Cloud)

  • Traditional code vs. outcome-focused agent design
  • Embracing autonomy and probabilistic decision-making
  • Supporting citizen developers
  • Building safe and responsible agents

Long-term Vision for Enterprise Workflows (Michael Gishin Har & Antonio G)

  • Evolution of production readiness (March 2024 → Feb 2025)
  • From tab completion → function generation → vibe coding
  • Creating new classes of engineers (10x → 100x productivity)
  • Citizen developers: salespeople, product managers as engineers
  • Three transformation types:
    • Individual productivity (deep research agents)
    • Process transformation (clinical trials → FDA submissions)
    • Self-improving agents with critic patterns

Multimodal Agents & Computer Use (Alan, Mike, Michael)

  • Voice interactions: DoorDash driver use case
  • Live audio for hands-free policy lookup
  • Computer use for automating UI interactions
  • Preparing data for real-time access (RAG, memory banks, prompt caching)
  • Start simple, layer complexity incrementally

ADK Design Principles (Mike Clark & Alan)

  • Interoperability: Open source, connects with LangChain, LangGraph, CrewAI
  • Works with MCP servers, APIs, enterprise wrappers
  • Betting on language models getting better
  • “Better together” Google platform while remaining open
  • Supports Python, Java, Go
  • Easy to start, full control for customization

Self-Organizing Agent Architectures (Antonio G)

  • Agents optimizing their own prompts automatically
  • Agent brokers: hiring, firing, merging agents
  • Dynamic topology evolution (creating critic agents on-demand)
  • Importance of continuous evaluation and drift monitoring
  • Research direction: Multi-agent design patterns
  • Recommended reading: “Agentic Design Pattern” article

Key Technologies & Concepts

  • Google Vertex AI Agent Builder
  • Agent Development Kit (ADK)
  • Model Context Protocol (MCP)
  • Agent-to-Agent (A2A) protocol
  • Retrieval-Augmented Generation (RAG) for fast data access
  • Memory banks for synthetic data storage
  • Prompt caching near accelerators

📝 Notes & Key Takeaways

Main Insights

  1. “It’s not a year of agents, it’s a decade of agents” (Andrej Karpathy quote) - The next AI revolution centers on autonomous agents

  2. Three major transformations happening simultaneously:
    • Creating new classes of engineers (citizen developers)
    • Individual productivity gains (deep research, rapid prototyping)
    • Enterprise process transformation (clinical trials, compliance workflows)
  3. The coding evolution in 12 months:
    • June 2024: 5 minutes of code in 70ms (autocomplete)
    • October 2024: Whole functions with conversation
    • February 2025: “Vibe coding” - fork off agents while you do other work
  4. ADK design philosophy: Bet on the model
    • Don’t constrain models with deterministic workflows
    • Maximize model utility and capability
    • Set the stage for future growth as models improve
  5. Interoperability is paramount:
    • Open source, works with LangChain, LangGraph, CrewAI
    • MCP for tool connections, A2A for agent communication
    • Break down silos between agent systems
  6. Self-improving agents are here:
    • Agents optimizing their own prompts
    • Critic agents providing feedback loops
    • Agent brokers dynamically organizing teams
    • Continuous evaluation and drift monitoring essential
  7. Real-world multimodal wins:
    • DoorDash: Voice agents for hands-free policy lookup while holding packages
    • Pharma: Database lock → FDA abstract generation fully automated
    • Key enabler: Sub-second data access via RAG + memory banks + prompt caching
  8. Start simple, grow complexity:
    • Debug in simplified text scenarios first
    • Layer on voice, computer use, new tools incrementally
    • Best agents require minimal interaction

Actionable Points

  • For Beginners: Start with the white paper, listen to the podcast first, then deep dive into the paper
  • For Developers: Use ADK to build interoperable agents that work with existing tools (LangChain, MCP servers)
  • Design Pattern: Implement critic agents to evaluate and improve main agent performance
  • Data Preparation: Use RAG for fast retrieval, memory banks for common scenarios, prompt caching for latency
  • Evaluation Strategy: Set metrics from day one, monitor performance over time to detect drift
  • Course Structure: Follow the 5-day progression: Intro → Memory → Evaluation → Prototyping → Production
  • Community Engagement: Join Discord for discussions with moderators who have real-world experience

Cross-References


Day 2: Agent Tools & Interoperability with MCP

Watch Day 2

Video URL: https://www.youtube.com/watch?v=Cr4NA6rxHAM Duration: ~20 minutes

📖 Description

A comprehensive deep dive into how AI agents interact with the real world through tools and the Model Context Protocol (MCP). Part of Google and Kaggle’s 5-Day AI Agents Intensive course, this whitepaper companion podcast explores tool design, MCP architecture, security challenges, and enterprise integration patterns for building production-ready AI agents.

🎯 Learning Objectives

By the end of this video, you will understand:

  • What tools are in the AI agent context and why they’re essential for transforming LLMs from “thinking” to “doing”
  • The three main tool types: function tools, built-in tools, and agent tools
  • Critical best practices for designing effective agent tools (documentation, task-focused design, concise outputs, error handling)
  • The Model Context Protocol (MCP) architecture and how it solves the n×m integration problem
  • MCP’s client-server model inspired by Language Server Protocol (LSP)
  • How MCP defines tools using JSON schemas and handles structured/unstructured results
  • Strategic benefits of standardization: reusable ecosystems, dynamic capabilities, architectural flexibility
  • Key scaling challenges: context window bloat and tool retrieval strategies (RAG for tools)
  • Critical security concerns: the “confused deputy” problem and mitigation through enterprise API gateways
  • Why external security layers (authentication, authorization, governance) are essential for safe MCP adoption

📋 Curriculum/Contents

Part 1: Understanding Agent Tools (0:00-7:30)

  • The Core Problem: Foundation models stuck in training data
    • LLMs can’t perceive current state or execute actions natively
    • Tools as “eyes and hands” for agents
  • Historical Integration Challenge: The n×m problem
    • Custom connectors for every model-tool pair
    • Fragmented, unscalable approach
  • Three Tool Categories:
    1. Function Tools: Developer-defined external functions with docstrings
    2. Built-in Tools: Platform-provided capabilities (search grounding, code execution)
    3. Agent Tools: Invoking sub-agents as delegation (hierarchical, not handoff)
  • Broader Taxonomy: Information retrieval, action execution, system APIs, human-in-the-loop

Part 2: Tool Design Best Practices (7:30-12:00)

  • Rule 1: Documentation is Paramount
    • Clear, descriptive names (e.g., “create_critical_bug_with_priority” vs “update_jira”)
    • Documentation becomes the instruction manual for the LLM
  • Rule 2: Describe Action, Not Implementation
    • Tell the model WHAT to accomplish, not HOW to call the function
    • Let the LLM reason, let the tool act
  • Rule 3: Publish Tasks, Not Raw APIs
    • Abstract away complexity (e.g., “book_meeting_room” vs raw calendar API with 15 parameters)
    • Single, clear, high-level task per tool
  • Rule 4: Design for Concise Output
    • Avoid context window bloat from massive data dumps
    • Return summaries, confirmations, or URIs (references to external storage)
    • Use artifact services (e.g., Google ADK’s artifact service)
  • Rule 5: Instructive Error Messages
    • Schema validation for inputs/outputs
    • Descriptive errors with recovery guidance (e.g., “API rate limit exceeded. Wait 15 seconds.”)

Part 3: Model Context Protocol (MCP) Architecture (12:00-16:30)

  • Core Components:
    1. MCP Host: Orchestrates agent reasoning, enforces safety guardrails
    2. MCP Client: Manages server connections, sends commands
    3. MCP Server: Advertises tools, executes commands, returns results
  • Communication Layer: JSON-RPC 2.0
  • Transport Layers:
    • STDIO: Local development, child processes (fast, efficient)
    • Streamable HTTP: Remote connections, server-sent events (SSE), flexible/stateless
  • Tool Definition: Standardized JSON schema
    • Required fields: name, description, input schema
    • Optional: output schema
    • Example: get_stock_price with symbol/date inputs and price/timestamp output
  • Result Types:
    • Structured: JSON objects conforming to output schema
    • Unstructured: Raw text, audio, images, URIs
  • Error Handling:
    • Protocol errors (invalid method, malformed parameters)
    • Execution errors (set isError: true flag with descriptive message)

Part 4: Strategic Benefits & Challenges (16:30-19:00)

  • Benefits:
    • Accelerated development & reusable ecosystem (public MCP registries)
    • Dynamic capabilities (runtime tool discovery)
    • Architectural flexibility (agentic AI mesh networks)
  • Scaling Challenge: Context Window Bloat
    • Problem: 1000 tools × detailed schemas = infeasible context size
    • Solution: RAG for Tools (tool retrieval)
      • Semantic search over indexed tools
      • Load only top 3-5 relevant tool definitions
      • Filter before loading
  • Security Challenge: Confused Deputy Problem
    • Low-privilege user tricks AI model → AI requests high-privilege action → MCP server executes without user authorization check
    • Prompt injection → privilege escalation
    • Solution: External security layers
      • Enterprise API gateways (e.g., Apigee)
      • Authentication, authorization, rate limiting, logging, input filtering
      • Security is AROUND MCP, not IN MCP

Part 5: Final Thoughts & Open Questions (19:00-20:00)

  • Practical Takeaway: Tool design best practices apply universally (even outside MCP)
  • Deep Question: How do we design interfaces/guardrails to ensure agents act on authorized intent vs. blindly following commands?
  • Accountability Challenge: Audit trails must capture difference between “what was asked” and “what was allowed”

📝 Notes & Key Takeaways

Main Insights

  1. Tools Transform LLMs from Thinking to Doing: Foundation models are brilliant pattern-matching machines but completely isolated from current data and real-world actions. Tools are the bridge that enables agents to perceive (fetch data) and act (execute tasks).

  2. MCP Solves the n×m Problem: Before MCP, integrating n models with m tools required n×m custom connectors. MCP provides an open standard that decouples agents (brains) from tools (hands), enabling plug-and-play modularity.

  3. Tool Design is Critical: The only way an LLM knows what a tool does is through documentation. Clear names, task-focused descriptions, concise outputs, and instructive error messages are non-negotiable for reliable agent performance.

  4. Context Window Bloat is the Main Scaling Challenge: Loading thousands of tool definitions into the LLM’s context is infeasible. The solution is RAG for tools—semantic search to retrieve only the top 3-5 relevant tools for the current task.

  5. Security Must Wrap MCP, Not Be Built Into It: MCP was designed for decentralized innovation, not enterprise security. The “confused deputy” problem (prompt injection → privilege escalation) requires external layers: API gateways, authentication, authorization, rate limiting, and logging.

Actionable Points

For Tool Developers:

  • Write comprehensive docstrings/descriptions for every tool
  • Design tools around high-level tasks, not raw API calls
  • Return summaries or URIs instead of dumping raw data
  • Provide instructive error messages with recovery paths

For Agent Builders:

  • Implement tool retrieval (RAG) to handle large tool catalogs
  • Use schema validation rigorously
  • Separate agent reasoning (LLM) from tool execution (MCP server)

For Enterprise Adopters:

  • Deploy API gateways (Apigee, etc.) in front of MCP servers
  • Implement robust authentication and authorization
  • Design audit trails capturing authorized intent vs. executed action
  • Never expose MCP servers directly without security layers

Cross-References

External Resources:

  • Whitepaper: https://www.kaggle.com/whitepaper-agent-tools-and-interoperability-with-mcp
  • 5-Day AI Agents Intensive: https://rsvp.withgoogle.com/events/google-ai-agents-intensive_2025
  • Google ADK Docs: https://google.github.io/adk-docs/
  • Kaggle Discord: http://discord.gg/kaggle

Day 3: Context Engineering: Sessions & Memory

Watch Day 3

Video URL: https://www.youtube.com/watch?v=8o-GXj8A3nE Duration: ~60 minutes

📖 Description

This is Day 3 of Kaggle’s 5-Day AI Agents Intensive Course, focusing on Context Engineering: Sessions & Memory. The livestream features expert speakers from Google (Stephen Johnson from NotebookLM, Julia Visinger, Kimberly from Agent Engine Memory Bank) and Jay Alamar from Cohere, covering advanced techniques for building stateful AI agents with persistent memory.

Key Topics Covered:

  • Short-term memory: Sessions and conversation management
  • Long-term memory: Persistent knowledge storage and retrieval
  • Context compaction strategies (summarization, truncation)
  • Active memory ETL pipeline (extract, consolidate, merge)
  • Memory systems: Declarative vs procedural memory
  • RAG vs memory: Dynamic, user-specific knowledge
  • Production considerations: Non-blocking operations, PII redaction

🎯 Learning Objectives

By the end of this video, you will understand:

  • How to build stateful agents that remember across conversations
  • The difference between sessions (short-term) and memory (long-term)
  • Context compaction strategies to prevent context window overflow
  • Memory ETL pipeline: extraction, consolidation, and deduplication
  • Declarative vs procedural memory patterns
  • How to implement sessions and memory in ADK (Agent Development Kit)
  • Production best practices: async memory generation, PII handling
  • Hybrid retrieval approaches: vector DB + knowledge graphs + re-ranking
  • Context caching for cost and latency optimization

📋 Curriculum/Contents

Part 1: Introduction & Context (0:00-15:00)

  • Course overview and community highlights
  • Day 3 focus: Context engineering, sessions, and memory
  • White paper walkthrough by Kimberly

Part 2: Expert Panel Q&A (15:00-45:00)

Speakers:

  • Stephen Johnson - NotebookLM founder, Editorial Director
  • Julia Visinger - PM at Google, ADK ecosystem
  • Kimberly Milm - Tech Lead, Agent Engine Memory Bank
  • Jay Alamar - Co-author “Hands-On Large Language Models”, Director at Cohere

Key Topics:

  • NotebookLM’s context management - Full 1M token window, RAG with alternate queries
  • ADK’s memory features - Turn instructions, context caching, static vs dynamic context
  • Hybrid memory systems - Vector DB + knowledge graphs + re-ranking
  • Memory quality control - Provenance, strict definitions, prompt injection defense
  • Cost vs context tradeoffs - Truncation vs summarization vs long-term memory
  • Narrative structure for state - Organizing experiences chronologically

Part 3: Code Labs Walkthrough (45:00-60:00)

  • Notebook 1: Sessions - Building stateful agents, in-memory vs database session services
  • Notebook 2: Memory - Long-term memory implementation, ETL pipeline
  • Practical implementation in ADK

📝 Notes & Key Takeaways

Main Insights

1. Sessions vs Memory - The Two-Tier System

  • Sessions = Short-term workbench (immediate conversation history)
  • Memory = Long-term filing cabinet (persistent knowledge across sessions)
  • LLMs are stateless by default - memory requires active context engineering

2. Context Rot Prevention

  • Problem: Finite context windows + growing conversation history = overflow
  • Solutions:
    • Recursive summarization (condense history with LLM)
    • Token-based truncation (discard older turns)
    • Context caching (store + inject, faster + cheaper)
    • Long-term memory extraction (save facts, discard noise)

3. Memory ETL Pipeline (Extract-Transform-Load)

  • Extract: Pull meaningful facts from noisy dialogue using LLM
  • Consolidate: Resolve conflicts, merge duplicates, update existing knowledge
  • Store: Persist in structured format (vector DB, knowledge graph, hybrid)

4. ADK’s Three-Layer Context Model

  • Static Instruction: Core identity, system prompts, safety policies (cached)
  • User Message: End user’s input (kept clean for logging/evals)
  • Turn Instruction: Dynamic per-turn steering from application backend

5. Memory Quality Controls

  • Provenance: Defer to high-trust data sources (CRM, human agents > inferred preferences)
  • Strict Definitions: Define exactly what to save (customization in Memory Bank)
  • Prompt Injection Defense: Use Model Armor to detect malicious inputs
  • Uncertainty Acknowledgment: Instruct LLM that memories are inferred, use with caution

6. Hybrid Retrieval > Pure Vector Search

  • Vector DB alone is insufficient
  • Add keyword search + re-rankers + knowledge graphs
  • Graph databases excel at relationship-based retrieval (useful for memory consolidation)

7. Production Best Practices

  • Memory generation = non-blocking background operation (avoid latency)
  • Redact PII rigorously (maintain user trust)
  • Memory as a tool (agent decides when to retrieve/store, not always-on)

8. NotebookLM’s Agentic Future

  • Context as UI: Source selection = explicit context control
  • Agent suggests tools: “You should create an audio overview on this topic”
  • Featured notebook = expert on how to use NotebookLM (meta-agent pattern)

Actionable Points

For Builders:

  1. ✅ Start with in-memory sessions for prototyping, migrate to database sessions for production
  2. ✅ Implement memory as a non-blocking background process (don’t block user requests)
  3. ✅ Use ADK’s turn instructions for dynamic context injection
  4. ✅ Test context caching to reduce costs (reuse static instructions)
  5. ✅ Define strict memory schemas (what to save, what to ignore)
  6. ✅ Build hybrid retrieval: vector + keyword + re-ranking
  7. ✅ Load white papers into NotebookLM for mind maps and video overviews

For Learners:

  1. ✅ Complete Day 3 codelabs:
    • https://www.kaggle.com/code/kaggle5daysofai/day-3a-agent-sessions
    • https://www.kaggle.com/code/kaggle5daysofai/day-3b-agent-memory
  2. ✅ Read white paper: https://www.kaggle.com/whitepaper-context-engineering-sessions-and-memory
  3. ✅ Listen to podcast: https://www.youtube.com/watch?v=FMcExVE15a4
  4. ✅ Explore NotebookLM’s mind map feature for visualizing white papers
  5. ✅ Join Kaggle Discord for community support: http://discord.gg/kaggle

For Capstone Projects:

  1. ✅ Experiment with graph RAG for human domain knowledge
  2. ✅ Build narrative-based state management (chronological organization)
  3. ✅ Test different compaction strategies (truncation vs summarization vs memory)
  4. ✅ Implement memory customization (define strict schemas)

Cross-References

NotebookLM Features Highlighted:

  • Mind maps for white paper visualization
  • Video overviews (new feature)
  • Audio overviews (AI podcast generation)
  • Source selection for focused context
  • Full 1M token context window
  • RAG with alternate query generation

ADK Features Highlighted:

  • In-memory session service (prototyping)
  • Database session service (production)
  • Vertex AI Agent Engine (enterprise)
  • Static/turn/user instruction layers
  • Context caching
  • Memory Bank integration
  • Tool-based memory (agent-controlled retrieval)

Day 4: Agent Quality

Watch Day 4

Video URL: https://www.youtube.com/watch?v=JW1Yybfxyr4 Duration: ~1 hour Uploaded: 2025-10-30

📹 Video Overview

Day 4: Agent Quality - A comprehensive livestream covering the critical topic of AI agent quality, including observability, evaluation frameworks, and production-ready practices from Google’s Kaggle 5-Day AI Agents Intensive Course.

🎯 Description

All course info at: https://www.kaggle.com/learn-guide/5-day-agents

Day 4 Focus: Agent Quality

Complete Unit 4:

  1. Summary podcast episode: https://www.youtube.com/watch?v=LFQRy-Ci-lk
  2. Agent Quality whitepaper: https://www.kaggle.com/whitepaper-agent-quality
  3. Codelabs on Kaggle:
    • Day 4a: Agent observability - https://www.kaggle.com/code/kaggle5daysofai/day-4a-agent-observability
    • Day 4b: Agent evaluation - https://www.kaggle.com/code/kaggle5daysofai/day-4b-agent-evaluation
    • Troubleshooting guide - https://www.kaggle.com/code/kaggle5daysofai/day-0-troubleshooting-and-faqs

Bonus: “Building GPU-Accelerated Data Science Agents” live on Kaggle: https://www.kaggle.com/code/jiweiliu/gpu-accelerated-data-science-agent

Community: Join Discord - http://discord.gg/kaggle

🎓 Learning Objectives

By the end of this session, you will understand:

  • Four Pillars of Agent Quality: Effectiveness, Efficiency, Robustness, Safety
  • Strategic Evaluation Hierarchy: Blackbox (outside-in) vs Glassbox (inside-out) views
  • Deep Observability Trinity: Structured logs, end-to-end traces, aggregated metrics
  • LLM as Judge: Scalable automated evaluation with human-in-the-loop
  • Agent Quality Flywheel: Turning production interactions into continuous improvement data
  • Multi-Agent Evaluation: Unique challenges in evaluating multi-agent systems
  • Production Observability: Implementing observability with ADK plugins and callbacks
  • GPU-Accelerated Data Science Agents: Using NVIDIA cuDF for fast data processing

📚 Curriculum & Key Topics

Part 1: Agent Quality Framework (Whitepaper Overview)

Core Framework - 4 Pillars:

  1. Effectiveness - Did it achieve its goal?
  2. Efficiency - Cost and speed optimization
  3. Robustness - Graceful error handling
  4. Safety - Adherence to ethical guidelines

Strategic Hierarchy:

  • Outside-in (Blackbox): Validate final outcomes
  • Inside-out (Glassbox): Debug reasoning trajectory
  • Why: Agent can get the right answer for wrong reasons - journey matters as much as outcome

Deep Observability Trinity:

  1. Structured logs - Raw facts
  2. End-to-end traces - Chain of thought
  3. Aggregated metrics - Health trends

Part 2: Expert Panel Q&A Highlights

LLM as Judge - Four Critical Biases

  1. Preference Bias - Models prefer their own generations
  2. Verbosity Bias - Favor long, confident answers
  3. Sycophancy - Agents agree with each other’s pushback
  4. Score Bias - Models hedge bets (always score 5/10)

Solution: Evaluate your evaluators with test sets and human correlation

Multi-Agent System Evaluation

Key Challenge: Best components ≠ best system (soccer team analogy)

Best Practices:

  • Evaluate interactions & orchestration (not just individual agents)
  • Build evaluation from Day Zero
  • Use multi-layered approach: Metrics + LLM as Judge + Human-in-the-Loop
  • Error compounds rapidly (10% per agent)

Tools: Google ADK, Vertex AI eval service, OpenTelemetry

Part 3: GPU-Accelerated Data Science Agents (Dr. Jay - NVIDIA)

NVIDIA cuDF: Zero-Code GPU Acceleration

import cudf.pandas
cudf.pandas.install()
# All pandas code below now GPU-accelerated!

Performance: 10 seconds (CPU) → 1 second (GPU) on 2-5GB datasets

Multi-Agent Architecture:

  1. Planner - Break down tasks
  2. Coder - Write/execute code, generate visualizations
  3. Vision - Interpret visualizations
  4. Writer - Combine insights, write report

Demos:

  • Interactive data exploration (42M rows, live on Kaggle GPU)
  • Full research report generation (NVIDIA NeMo Super 49B)

Part 4: Code Labs - Agent Observability

Three Pillars:

  1. Logs - What happened at specific time
  2. Traces - Connect logs into cohesive story
  3. Metrics - Average latency, failure rate

Demo: Research Paper Finder Agent

  • Debugging with ADK Web UI
  • Root cause analysis via trace inspection
  • Production implementation with plugins & callbacks

💡 Main Insights & Key Takeaways

  1. Agent Quality is Multi-Dimensional - 4 Pillars + 2 Views + 3 Observability Components
  2. LLM as Judge Has Biases - Mitigation: Rubrics, evaluator evaluation, human correlation
  3. Multi-Agent Unique - Orchestration critical, error compounds, Day Zero evaluation
  4. Production Infrastructure - OpenTelemetry, seamless experimentation → production
  5. GPU Transforms Data Science - cuDF 10x speedup, quantized models, multi-agent workflows
  6. Evaluation is Continuous - Not perfection, understanding + improvement
  7. Mindset Shift - Agentic ≠ traditional development, nondeterministic, probabilistic

🎯 Actionable Next Steps

  • Complete Day 4a codelab: Agent observability
  • Complete Day 4b codelab: Agent evaluation
  • Read Agent Quality whitepaper
  • Try GPU-accelerated data science agent notebook
  • Implement rubrics for LLM as judge
  • Design observability from Day Zero in new projects

Cross-References

🔗 Resources

Course Materials:

  • Course guide: https://www.kaggle.com/learn-guide/5-day-agents
  • Day 4 podcast: https://www.youtube.com/watch?v=LFQRy-Ci-lk
  • Whitepaper: https://www.kaggle.com/whitepaper-agent-quality

Code Labs:

  • Day 4a Observability: https://www.kaggle.com/code/kaggle5daysofai/day-4a-agent-observability
  • Day 4b Evaluation: https://www.kaggle.com/code/kaggle5daysofai/day-4b-agent-evaluation
  • GPU Demo: https://www.kaggle.com/code/jiweiliu/gpu-accelerated-data-science-agent

Tools:

  • Google ADK (Agent Development Kit)
  • NVIDIA NeMo Agent Toolkit
  • NVIDIA cuDF (GPU pandas)
  • OpenTelemetry

Day 5: Prototype to Production

Watch Day 5

Video URL: https://www.youtube.com/watch?v=8Wyt9l7ge-g Duration: 19 minutes

📖 Description

This whitepaper provides a comprehensive technical guide to the operational life cycle of AI agents, focusing on deployment, scaling, and productionizing. Building on Day 4’s coverage of evaluation and observability, this guide emphasizes how to build the necessary trust to move agents into production through robust CI/CD pipelines and scalable infrastructure. It explores the challenges of transitioning agent-based systems from prototypes to enterprise-grade solutions, with special attention to Agent2Agent (A2A) interoperability. This guide offers practical insights for AI/ML engineers, DevOps professionals, and system architects.

🎯 Learning Objectives

By the end of this video, you will understand:

  • The operational life cycle of AI agents from prototype to production (“AgentOps”)
  • Why 80% of development effort is infrastructure, security, and validation (not core AI)
  • The three foundational pillars: automated evaluation, CI/CD pipelines, and observability
  • How to implement evaluation-gated deployment with progressive funnel approach
  • Security challenges unique to AI agents (prompt injection, memory poisoning, data leakage)
  • The three-layer defense system: policy definition, guardrails, continuous assurance
  • Operational control strategies: decoupled state, caching, retries, cost management
  • Agent-to-Agent (A2A) interoperability vs Model Context Protocol (MCP)
  • Building collaborative AI agent ecosystems at enterprise scale

📋 Curriculum/Contents

Part 1: The Production Gap Challenge (0:00-3:00)

  • The “last mile” problem: demo to production
  • Why traditional MLOps doesn’t work for autonomous agents
  • Dynamic tool orchestration challenges
  • State management at scale
  • Unpredictable cost and latency issues

Part 2: People & Process First (3:00-6:00)

  • Team structure for generative AI
  • New specialized roles: Prompt Engineers and AI Engineers
  • Cross-team coordination requirements
  • Governance and responsibility frameworks

Part 3: Pre-Production Pipeline (6:00-10:00)

  • Evaluation-gated deployment principle
  • Manual pre-validation vs automated in-pipeline gates
  • The three-phase progressive funnel:
    • Phase 1: Pre-merge integration (CI)
    • Phase 2: Post-merge validation & staging (CD)
    • Phase 3: Gated production deployment
  • Safe rollout strategies: Canary, Blue-Green, A/B testing
  • Version control: code, prompts, schemas, memory structure

Part 4: Security Framework (10:00-12:00)

  • Google’s Secure AI Agents approach (SIF-based)
  • Layer 1: Policy definition (agent constitution)
  • Layer 2: Guardrails and filtering (input/output, HITL)
  • Layer 3: Continuous assurance (red teaming, safety testing)
  • Common threats: prompt injection, data leakage, memory poisoning

Part 5: Operational Loop (12:00-16:00)

  • Observe: Logs, traces, metrics (the three pillars)
  • Act: Decoupled state, caching, retries, cost optimization
  • Evolve: Production-driven improvement cycle
  • Security response playbook: containment, triage, resolution
  • Rapid iteration through automated CI/CD

Part 6: Agent Interoperability (16:00-19:00)

  • Breaking down agent silos
  • Model Context Protocol (MCP): stateless tool interaction
  • Agent-to-Agent (A2A): stateful goal-oriented collaboration
  • Agent discovery: agent cards and registries
  • Distributed tracing across multi-agent systems
  • Building collaborative AI ecosystems

📝 Notes & Key Takeaways

Main Insights

  1. 80% of effort is infrastructure, not AI: The shocking reality is that building the core AI model is just 20% of the work. The remaining 80% is infrastructure, security, monitoring, and validation systems that make it production-ready.

  2. Evaluation-gated deployment is non-negotiable: No agent should touch real users until it passes rigorous automated checks. This requires building golden datasets and automated evaluation suites that run in the CI/CD pipeline.

  3. Agents need different MLOps than models: Traditional ML models are predictable (Input X → Output Y). Agents are dynamic, stateful, and autonomous, requiring new approaches to testing, versioning, and monitoring.

  4. Three-layer security defense: (1) Policy definition through system instructions, (2) Hard guardrails with input/output filtering and HITL escalation, (3) Continuous assurance through red teaming and safety testing.

  5. Decouple logic from state for scalability: Store memory and session data externally (Firestore, Cloud SQL) so agent logic can scale horizontally without state bottlenecks.

  6. Observability requires logs + traces + metrics: Logs give you the diary, traces give you the narrative (causal chain), metrics give you the report card. You need all three for effective agent monitoring.

  7. MCP vs A2A serve different purposes: MCP is for stateless tool interactions (“fetch the weather”). A2A is for stateful agent collaboration (“analyze customer churn and suggest strategies”). Often used together.

  8. Velocity is the ultimate prize: Good AgentOps means deploying meaningful improvements in hours or days, not weeks or months. Continuous evolution is the competitive advantage.

Actionable Points

  • Start with fundamentals: Build a solid evaluation dataset and basic CI/CD pipeline with automated evaluation gates before anything else.

  • Implement progressive funnel: Use the three-phase approach (pre-merge CI, staging validation, gated production) to catch errors early and cheaply.

  • Use safe rollout strategies: Never deploy to 100% of users at once. Use canary releases (1% traffic), blue-green deployments (instant rollback), or A/B testing.

  • Version everything: Code, prompts, tool schemas, memory structure all need version numbers for instant rollback capability.

  • Turn production failures into tests: When something breaks in production, immediately add it to your golden evaluation dataset so it becomes part of future testing.

  • Build security response playbook: Pre-define containment procedures (circuit breakers, feature flags), triage routing (HITL queues), and rapid patch deployment workflows.

  • Implement distributed tracing: When agents collaborate, you need unique IDs that follow requests across services to understand the full causal chain.

  • Design idempotent tools: Tools involved in state changes must be safely retryable. “Get weather” can retry; “charge credit card” cannot without careful design.

Cross-References

🔗 Further Resources

Related Searches:

  • “AI agent CI/CD pipelines”
  • “Agent-to-Agent (A2A) interoperability”
  • “Model Context Protocol (MCP) implementation”
  • “AI agent security guardrails”
  • “AgentOps vs MLOps differences”
  • “Distributed tracing for multi-agent systems”
  • “Vertex AI safety filters”
  • “Red teaming AI agents”

Further Resources:


Summary & Key Takeaways

Complete Course Arc

This 5-day intensive course takes you through a comprehensive journey from understanding what AI agents are to deploying them at enterprise scale:

  1. Day 1 establishes the foundation: agent architecture (model + tools + orchestration), taxonomy levels, and the paradigm shift from coding to directing agents
  2. Day 2 connects agents to the real world through MCP tools and interoperability, covering tool design best practices
  3. Day 3 builds stateful agents through sessions and memory, enabling continuity across conversations
  4. Day 4 ensures quality through observability (logs/traces/metrics) and evaluation frameworks
  5. Day 5 brings it all together with production deployment, security, and AgentOps

Universal Principles Across All Days

The Agent Trinity

  • Model (Brain): LLM for reasoning and decision-making
  • Tools (Hands): MCP-enabled actions and information retrieval
  • Orchestration (Nervous System): Think-act-observe loop with memory

The Quality Pillars

  1. Effectiveness: Does it achieve the goal?
  2. Efficiency: Cost and speed optimization
  3. Robustness: Graceful error handling
  4. Safety: Ethical guidelines and security

The Production Essentials

  1. Evaluation: Automated golden datasets + LLM as Judge + Human-in-the-Loop
  2. Observability: Logs + Traces + Metrics
  3. Security: Policy definition + Guardrails + Continuous assurance
  4. Interoperability: MCP for tools, A2A for agents

Critical Mindset Shifts

From Traditional Development:

  • Deterministic → Probabilistic
  • Coding → Directing
  • Perfect → Good enough with continuous improvement
  • Static → Dynamic and self-evolving

From Traditional MLOps:

  • Model-centric → System-centric
  • Batch predictions → Real-time interactions
  • Input-output pairs → Multi-step reasoning
  • Versioning models → Versioning everything (code, prompts, schemas, memory)

The Agent Development Lifecycle

Research → Prototype → Evaluate → Deploy → Observe → Improve
    ↑                                                     ↓
    └─────────────── Continuous Evolution ───────────────┘
  1. Research Phase: Understand domain, design agent architecture, choose tools
  2. Prototype Phase: Build with ADK, implement sessions/memory, integrate MCP tools
  3. Evaluate Phase: Create golden datasets, implement automated evaluation, test robustness
  4. Deploy Phase: CI/CD with evaluation gates, progressive rollout, security layers
  5. Observe Phase: Monitor logs/traces/metrics, detect drift, capture edge cases
  6. Improve Phase: Turn failures into tests, optimize prompts, evolve capabilities

Technology Stack Summary

Core Platforms:

  • Google Agent Development Kit (ADK)
  • Vertex AI Agent Builder
  • Vertex AI Agent Engine

Memory & Context:

  • In-memory sessions (prototyping)
  • Database sessions (production)
  • Memory Bank (long-term storage)
  • Context caching (cost optimization)
  • RAG systems (retrieval)

Tool Integration:

  • Model Context Protocol (MCP)
  • Agent-to-Agent (A2A) protocol
  • LangChain/LangGraph/CrewAI compatibility
  • Enterprise API gateways

Quality & Operations:

  • OpenTelemetry (observability)
  • Vertex AI eval service
  • CI/CD pipelines
  • Blue-Green/Canary deployments

Acceleration:

  • NVIDIA cuDF (GPU pandas)
  • NVIDIA NeMo Agent Toolkit
  • Prompt caching near accelerators

Real-World Use Cases Highlighted

  1. DoorDash: Voice agents for hands-free policy lookup while holding packages
  2. Pharmaceutical: Database lock → FDA abstract generation automation
  3. Data Science: 42M row datasets analyzed with GPU-accelerated multi-agent systems
  4. Research: Deep research agents for comprehensive literature reviews
  5. Enterprise Workflows: Clinical trials → FDA submissions fully automated

Common Pitfalls & Solutions

Pitfall: Context window overflow from growing conversation history Solution: Recursive summarization + context caching + long-term memory extraction

Pitfall: LLM as Judge biases (verbosity, preference, sycophancy) Solution: Use rubrics, evaluate evaluators, maintain human correlation

Pitfall: Multi-agent error compounding (10% per agent) Solution: Evaluate orchestration, not just individual agents

Pitfall: Confused deputy security problem (privilege escalation via prompt injection) Solution: External API gateways with authentication/authorization/rate limiting

Pitfall: Tool context bloat (1000s of tool definitions) Solution: RAG for tools—semantic search for top 3-5 relevant tools

Pitfall: 80% of effort on infrastructure vs 20% on AI Solution: Use ADK, leverage existing platforms, don’t rebuild from scratch

What Makes a Great Agent

  1. Clear Purpose: Well-defined tasks with measurable success criteria
  2. Reliable Tools: Task-focused, well-documented, concise output, instructive errors
  3. Robust Memory: Appropriate retention (sessions for short-term, memory for long-term)
  4. Continuous Evaluation: Golden datasets, automated testing, drift monitoring
  5. Graceful Failures: Uncertainty acknowledgment, HITL escalation, retry logic
  6. Security First: Policy definition, guardrails, continuous red teaming
  7. Observable Behavior: Structured logs, end-to-end traces, aggregated metrics
  8. Rapid Evolution: CI/CD automation, version control, production feedback loops

Complete Resource List

📚 Course Materials

Main Course Hub:

  • Course Guide: https://www.kaggle.com/learn-guide/5-day-agents
  • Discord Community: http://discord.gg/kaggle

Day 1: Introduction to Agents:

  • Video: https://www.youtube.com/watch?v=ZaUcqznlhv8
  • White Paper: https://www.kaggle.com/whitepaper-introduction-to-agents

Day 2: Agent Tools & MCP:

  • Video: https://www.youtube.com/watch?v=Cr4NA6rxHAM
  • White Paper: https://www.kaggle.com/whitepaper-agent-tools-and-interoperability-with-mcp

Day 3: Context Engineering:

  • Video: https://www.youtube.com/watch?v=8o-GXj8A3nE
  • Podcast: https://www.youtube.com/watch?v=FMcExVE15a4
  • White Paper: https://www.kaggle.com/whitepaper-context-engineering-sessions-and-memory
  • Codelab 3a Sessions: https://www.kaggle.com/code/kaggle5daysofai/day-3a-agent-sessions
  • Codelab 3b Memory: https://www.kaggle.com/code/kaggle5daysofai/day-3b-agent-memory

Day 4: Agent Quality:

  • Video: https://www.youtube.com/watch?v=JW1Yybfxyr4
  • Podcast: https://www.youtube.com/watch?v=LFQRy-Ci-lk
  • White Paper: https://www.kaggle.com/whitepaper-agent-quality
  • Codelab 4a Observability: https://www.kaggle.com/code/kaggle5daysofai/day-4a-agent-observability
  • Codelab 4b Evaluation: https://www.kaggle.com/code/kaggle5daysofai/day-4b-agent-evaluation
  • GPU Demo: https://www.kaggle.com/code/jiweiliu/gpu-accelerated-data-science-agent

Day 5: Prototype to Production:

  • Video: https://www.youtube.com/watch?v=8Wyt9l7ge-g
  • White Paper: https://www.kaggle.com/whitepaper-prototype-to-production

Troubleshooting:

  • Day 0 FAQ: https://www.kaggle.com/code/kaggle5daysofai/day-0-troubleshooting-and-faqs

🛠️ Tools & Platforms

Google/Kaggle:

  • Google ADK Documentation: https://google.github.io/adk-docs/
  • Vertex AI Agent Builder: https://cloud.google.com/vertex-ai/agents
  • NotebookLM: https://notebooklm.google.com/

NVIDIA:

  • NVIDIA NeMo Agent Toolkit: https://www.nvidia.com/en-us/ai-data-science/products/nemo/
  • NVIDIA cuDF (GPU pandas): https://docs.rapids.ai/api/cudf/stable/

Frameworks & Protocols:

  • Model Context Protocol (MCP): https://modelcontextprotocol.io/
  • LangChain: https://www.langchain.com/
  • LangGraph: https://github.com/langchain-ai/langgraph
  • CrewAI: https://www.crewai.com/
  • OpenTelemetry: https://opentelemetry.io/

Books:

  • “Hands-On Large Language Models” by Jay Alamar (O’Reilly)
  • “Illustrated Guide to AI Agents” by Jay Alamar (upcoming)
  • “Agentic Design Pattern” by Antonio G

Articles & Papers:

  • “Multi-agent design” (Google article)
  • Original Google Agents White Paper by Julia and Patrick
  • SIF-based Secure AI Agents approach

For Absolute Beginners:

  1. Day 1 video → White paper → Podcast
  2. Experiment with NotebookLM (load white papers, create audio overviews)
  3. Day 2 podcast → Understand tool design principles
  4. Day 3 video → Complete both codelabs (sessions + memory)
  5. Day 4 video → Complete observability codelab
  6. Day 5 podcast → Learn production deployment

For Developers:

  1. Day 1 video (focus on ADK design principles)
  2. Day 2 → Implement a simple MCP tool
  3. Day 3 codelabs → Build a stateful agent
  4. Day 4 codelabs → Add observability and evaluation
  5. Day 5 → Deploy with CI/CD

For Enterprise Architects:

  1. All Day 1 content (strategy and vision)
  2. Day 2 (MCP security considerations)
  3. Day 3 expert panel (memory at scale)
  4. Day 4 expert panel (multi-agent evaluation)
  5. Day 5 (AgentOps and production deployment)

For Data Scientists:

  1. Day 1 video
  2. Day 4 GPU demo (NVIDIA cuDF)
  3. Day 3 memory systems (RAG patterns)
  4. Day 4 evaluation frameworks
  5. Build a data science agent with multi-agent architecture

🔍 Key Search Terms

Technical Concepts:

  • Agent Development Kit (ADK)
  • Model Context Protocol (MCP)
  • Agent-to-Agent (A2A) protocol
  • Think-act-observe loop
  • Retrieval-Augmented Generation (RAG)
  • Context engineering
  • Memory ETL pipeline
  • LLM as Judge
  • Confused deputy problem
  • Tool retrieval
  • AgentOps vs MLOps

Practical Topics:

  • Agentic design patterns
  • Multi-agent orchestration
  • Context window management
  • Prompt caching strategies
  • Evaluation-gated deployment
  • Progressive funnel approach
  • Canary/Blue-Green deployments
  • Distributed tracing
  • Memory consolidation
  • Hybrid retrieval systems

Security & Governance:

  • Prompt injection defense
  • Memory poisoning
  • PII redaction
  • Agent constitution
  • Guardrails and filtering
  • Red teaming AI agents
  • Enterprise API gateways
  • Audit trails for agents

Week 1: Foundations

  • Day 1-2: Watch Day 1 video, read white paper, listen to podcast
  • Day 3-4: Explore NotebookLM, create audio overview of white papers
  • Day 5: Join Discord, introduce yourself, review community projects

Week 2: Building Stateful Agents

  • Day 1-2: Watch Day 3 video, read white paper
  • Day 3: Complete Day 3a codelab (sessions)
  • Day 4: Complete Day 3b codelab (memory)
  • Day 5: Build your own simple agent with memory

Week 3: Quality & Evaluation

  • Day 1-2: Watch Day 4 video, read white paper
  • Day 3: Complete Day 4a codelab (observability)
  • Day 4: Complete Day 4b codelab (evaluation)
  • Day 5: Add evaluation to your Week 2 agent

Week 4: Tools & Integration

  • Day 1-2: Watch MCP whitepaper podcast, read full paper
  • Day 3-4: Design and implement a custom MCP tool
  • Day 5: Integrate your tool with your agent

Week 5: Production Deployment

  • Day 1-2: Watch Prototype to Production podcast, read full paper
  • Day 3-4: Set up CI/CD pipeline for your agent
  • Day 5: Deploy with evaluation gates and observability

Week 6: Capstone Project

  • Day 1-5: Build a complete production-ready agent:
    • Multi-agent architecture
    • MCP tool integration
    • Sessions + long-term memory
    • Automated evaluation
    • CI/CD deployment
    • Full observability (logs/traces/metrics)
    • Security guardrails

🏷️ Unified Tag Analysis

Content Type:

  • video - Multiple YouTube videos and livestreams
  • comprehensive-guide - Complete course compilation

Topics:

  • AI - Core focus on artificial intelligence agents
  • learning - Educational course structure
  • development - Building agent systems with ADK
  • agents - Primary subject matter throughout
  • knowledge-management - Memory systems, context engineering
  • data-science - GPU-accelerated data science agents
  • tools - MCP, tool design, integration patterns
  • automation - Agent automation and orchestration
  • productivity - Velocity and efficiency gains

Complexity:

  • tutorial - Step-by-step learning curriculum
  • technical - Architecture, protocols, implementation
  • deep-dive - Comprehensive exploration of topics

Metadata:

  • inbox - Newly compiled comprehensive guide
  • high priority - Essential knowledge for modern AI agent development

Suggested Bases Filters:

type = comprehensive-guide AND tags contains "AI"
priority = high AND tags contains "agents"
tags contains "learning" AND tags contains "technical"
tags contains "AI" AND tags contains "development" AND tags contains "tutorial"

Compiled: 2025-11-17 Original Course Dates: 2025-11-11 to 2025-11-17 Source: Kaggle x Google 5-Day AI Agents Intensive Course Course Homepage: https://www.kaggle.com/learn-guide/5-day-agents

Connection to Other Notes:

  • Foundation for all AI agent development projects
  • Links to ADK framework documentation and implementations
  • Connects to MCP protocol specifications and tools
  • Relevant for MLOps, DevOps, and production system architecture
  • Essential reading for understanding modern agentic AI systems