Kaggle x Google 5-Day AI Agents Intensive Course - Complete Guide

Kaggle x Google AI Agents

📖 Overview

This is the complete, comprehensive guide to Kaggle’s 5-Day AI Agents Intensive Course hosted by Google. This guide combines all course materials from Day 1 through Day 5, covering everything from foundational agent architecture to production deployment.

The course takes learners from fundamental concepts to advanced production implementations, covering everything from basic agent architecture to enterprise-scale deployment strategies. Each day builds progressively on previous knowledge, culminating in production-ready agent systems.

Course Homepage: https://www.kaggle.com/learn-guide/5-day-agents Community: Join Discord at http://discord.gg/kaggle

🗺️ Table of Contents

Course Structure

Day 1: Introduction to Agents - Architecture, taxonomy, and paradigm shift
Day 2: Agent Tools & Interoperability with MCP - Tool design and Model Context Protocol
Day 3: Context Engineering: Sessions & Memory - Short-term and long-term memory systems
Day 4: Agent Quality - Observability, evaluation, and production metrics
Day 5: Prototype to Production - Deployment, scaling, and AgentOps

Day 1: Introduction to Agents

Video URL: https://www.youtube.com/watch?v=ZaUcqznlhv8 Duration: ~2 hours

📖 Description

This is the first day of Kaggle’s 5-day intensive course on AI agents, hosted by Google. The live Q&A session features industry experts discussing the fundamentals of AI agents, agentic architectures, and Google’s Agent Development Kit (ADK). The course is designed to take learners from basic concepts to advanced agent implementations, covering topics like memory, evaluation, and production deployment.

🎯 Learning Objectives

By the end of this video, you will understand:

📋 Curriculum/Contents

Introduction & Overview

Welcome and course structure (Kalpana Parlola & Anand Nalgria)
Course design: basics to advanced over 5 days
Community engagement and moderator introduction

White Paper Overview: “Introduction to Agents and Agentic Architectures”

Evolution from first agents white paper (1 year ago)
Core architecture: Model (brain) + Tools (hands) + Orchestration (nervous system)
Think-act-observe loop explained
Taxonomy of agentic systems (Level 0-4)
Multi-agent systems and collaboration patterns
Interoperability: A2A protocol, agent-to-human communication
Security: Agent identity and governance

Industry Expert Panel Q&A

Paradigm Shift: From Brick Layers to Directors (Mike Clark - Google Cloud)

Traditional code vs. outcome-focused agent design
Embracing autonomy and probabilistic decision-making
Supporting citizen developers
Building safe and responsible agents

Long-term Vision for Enterprise Workflows (Michael Gishin Har & Antonio G)

Evolution of production readiness (March 2024 → Feb 2025)
From tab completion → function generation → vibe coding
Creating new classes of engineers (10x → 100x productivity)
Citizen developers: salespeople, product managers as engineers
Three transformation types:
- Individual productivity (deep research agents)
- Process transformation (clinical trials → FDA submissions)
- Self-improving agents with critic patterns

Multimodal Agents & Computer Use (Alan, Mike, Michael)

Voice interactions: DoorDash driver use case
Live audio for hands-free policy lookup
Computer use for automating UI interactions
Preparing data for real-time access (RAG, memory banks, prompt caching)
Start simple, layer complexity incrementally

ADK Design Principles (Mike Clark & Alan)

Interoperability: Open source, connects with LangChain, LangGraph, CrewAI
Works with MCP servers, APIs, enterprise wrappers
Betting on language models getting better
“Better together” Google platform while remaining open
Supports Python, Java, Go
Easy to start, full control for customization

Self-Organizing Agent Architectures (Antonio G)

Agents optimizing their own prompts automatically
Agent brokers: hiring, firing, merging agents
Dynamic topology evolution (creating critic agents on-demand)
Importance of continuous evaluation and drift monitoring
Research direction: Multi-agent design patterns
Recommended reading: “Agentic Design Pattern” article

Key Technologies & Concepts

Google Vertex AI Agent Builder
Agent Development Kit (ADK)
Model Context Protocol (MCP)
Agent-to-Agent (A2A) protocol
Retrieval-Augmented Generation (RAG) for fast data access
Memory banks for synthetic data storage
Prompt caching near accelerators

📝 Notes & Key Takeaways

Main Insights

“It’s not a year of agents, it’s a decade of agents” (Andrej Karpathy quote) - The next AI revolution centers on autonomous agents
Three major transformations happening simultaneously:
- Creating new classes of engineers (citizen developers)
- Individual productivity gains (deep research, rapid prototyping)
- Enterprise process transformation (clinical trials, compliance workflows)
The coding evolution in 12 months:
- June 2024: 5 minutes of code in 70ms (autocomplete)
- October 2024: Whole functions with conversation
- February 2025: “Vibe coding” - fork off agents while you do other work
ADK design philosophy: Bet on the model
- Don’t constrain models with deterministic workflows
- Maximize model utility and capability
- Set the stage for future growth as models improve
Interoperability is paramount:
- Open source, works with LangChain, LangGraph, CrewAI
- MCP for tool connections, A2A for agent communication
- Break down silos between agent systems
Self-improving agents are here:
- Agents optimizing their own prompts
- Critic agents providing feedback loops
- Agent brokers dynamically organizing teams
- Continuous evaluation and drift monitoring essential
Real-world multimodal wins:
- DoorDash: Voice agents for hands-free policy lookup while holding packages
- Pharma: Database lock → FDA abstract generation fully automated
- Key enabler: Sub-second data access via RAG + memory banks + prompt caching
Start simple, grow complexity:
- Debug in simplified text scenarios first
- Layer on voice, computer use, new tools incrementally
- Best agents require minimal interaction

Actionable Points

For Beginners: Start with the white paper, listen to the podcast first, then deep dive into the paper
For Developers: Use ADK to build interoperable agents that work with existing tools (LangChain, MCP servers)
Design Pattern: Implement critic agents to evaluate and improve main agent performance
Data Preparation: Use RAG for fast retrieval, memory banks for common scenarios, prompt caching for latency
Evaluation Strategy: Set metrics from day one, monitor performance over time to detect drift
Course Structure: Follow the 5-day progression: Intro → Memory → Evaluation → Prototyping → Production
Community Engagement: Join Discord for discussions with moderators who have real-world experience

Cross-References

Connect to Day 2: Agent Tools & MCP for tool integration details
Related to Day 3: Memory Systems for implementing agent memory
See Day 4: Agent Quality for evaluation and observability
Build on concepts in Day 5: Production for deployment

Day 2: Agent Tools & Interoperability with MCP

Video URL: https://www.youtube.com/watch?v=Cr4NA6rxHAM Duration: ~20 minutes

📖 Description

A comprehensive deep dive into how AI agents interact with the real world through tools and the Model Context Protocol (MCP). Part of Google and Kaggle’s 5-Day AI Agents Intensive course, this whitepaper companion podcast explores tool design, MCP architecture, security challenges, and enterprise integration patterns for building production-ready AI agents.

🎯 Learning Objectives

By the end of this video, you will understand:

📋 Curriculum/Contents

Part 1: Understanding Agent Tools (0:00-7:30)

The Core Problem: Foundation models stuck in training data
- LLMs can’t perceive current state or execute actions natively
- Tools as “eyes and hands” for agents
Historical Integration Challenge: The n×m problem
- Custom connectors for every model-tool pair
- Fragmented, unscalable approach
Three Tool Categories:
1. Function Tools: Developer-defined external functions with docstrings
2. Built-in Tools: Platform-provided capabilities (search grounding, code execution)
3. Agent Tools: Invoking sub-agents as delegation (hierarchical, not handoff)
Broader Taxonomy: Information retrieval, action execution, system APIs, human-in-the-loop

Part 2: Tool Design Best Practices (7:30-12:00)

Rule 1: Documentation is Paramount
- Clear, descriptive names (e.g., “create_critical_bug_with_priority” vs “update_jira”)
- Documentation becomes the instruction manual for the LLM
Rule 2: Describe Action, Not Implementation
- Tell the model WHAT to accomplish, not HOW to call the function
- Let the LLM reason, let the tool act
Rule 3: Publish Tasks, Not Raw APIs
- Abstract away complexity (e.g., “book_meeting_room” vs raw calendar API with 15 parameters)
- Single, clear, high-level task per tool
Rule 4: Design for Concise Output
- Avoid context window bloat from massive data dumps
- Return summaries, confirmations, or URIs (references to external storage)
- Use artifact services (e.g., Google ADK’s artifact service)
Rule 5: Instructive Error Messages
- Schema validation for inputs/outputs
- Descriptive errors with recovery guidance (e.g., “API rate limit exceeded. Wait 15 seconds.”)

Part 3: Model Context Protocol (MCP) Architecture (12:00-16:30)

Core Components:
1. MCP Host: Orchestrates agent reasoning, enforces safety guardrails
2. MCP Client: Manages server connections, sends commands
3. MCP Server: Advertises tools, executes commands, returns results
Communication Layer: JSON-RPC 2.0
Transport Layers:
- STDIO: Local development, child processes (fast, efficient)
- Streamable HTTP: Remote connections, server-sent events (SSE), flexible/stateless
Tool Definition: Standardized JSON schema
- Required fields: name, description, input schema
- Optional: output schema
- Example: get_stock_price with symbol/date inputs and price/timestamp output
Result Types:
- Structured: JSON objects conforming to output schema
- Unstructured: Raw text, audio, images, URIs
Error Handling:
- Protocol errors (invalid method, malformed parameters)
- Execution errors (set isError: true flag with descriptive message)

Part 4: Strategic Benefits & Challenges (16:30-19:00)

Benefits:
- Accelerated development & reusable ecosystem (public MCP registries)
- Dynamic capabilities (runtime tool discovery)
- Architectural flexibility (agentic AI mesh networks)
Scaling Challenge: Context Window Bloat
- Problem: 1000 tools × detailed schemas = infeasible context size
- Solution: RAG for Tools (tool retrieval)
  - Semantic search over indexed tools
  - Load only top 3-5 relevant tool definitions
  - Filter before loading
Security Challenge: Confused Deputy Problem
- Low-privilege user tricks AI model → AI requests high-privilege action → MCP server executes without user authorization check
- Prompt injection → privilege escalation
- Solution: External security layers
  - Enterprise API gateways (e.g., Apigee)
  - Authentication, authorization, rate limiting, logging, input filtering
  - Security is AROUND MCP, not IN MCP

Part 5: Final Thoughts & Open Questions (19:00-20:00)

Practical Takeaway: Tool design best practices apply universally (even outside MCP)
Deep Question: How do we design interfaces/guardrails to ensure agents act on authorized intent vs. blindly following commands?
Accountability Challenge: Audit trails must capture difference between “what was asked” and “what was allowed”

📝 Notes & Key Takeaways

Main Insights

Tools Transform LLMs from Thinking to Doing: Foundation models are brilliant pattern-matching machines but completely isolated from current data and real-world actions. Tools are the bridge that enables agents to perceive (fetch data) and act (execute tasks).
MCP Solves the n×m Problem: Before MCP, integrating n models with m tools required n×m custom connectors. MCP provides an open standard that decouples agents (brains) from tools (hands), enabling plug-and-play modularity.
Tool Design is Critical: The only way an LLM knows what a tool does is through documentation. Clear names, task-focused descriptions, concise outputs, and instructive error messages are non-negotiable for reliable agent performance.
Context Window Bloat is the Main Scaling Challenge: Loading thousands of tool definitions into the LLM’s context is infeasible. The solution is RAG for tools—semantic search to retrieve only the top 3-5 relevant tools for the current task.
Security Must Wrap MCP, Not Be Built Into It: MCP was designed for decentralized innovation, not enterprise security. The “confused deputy” problem (prompt injection → privilege escalation) requires external layers: API gateways, authentication, authorization, rate limiting, and logging.

Actionable Points

For Tool Developers:

Write comprehensive docstrings/descriptions for every tool
Design tools around high-level tasks, not raw API calls
Return summaries or URIs instead of dumping raw data
Provide instructive error messages with recovery paths

For Agent Builders:

Implement tool retrieval (RAG) to handle large tool catalogs
Use schema validation rigorously
Separate agent reasoning (LLM) from tool execution (MCP server)

For Enterprise Adopters:

Deploy API gateways (Apigee, etc.) in front of MCP servers
Implement robust authentication and authorization
Design audit trails capturing authorized intent vs. executed action
Never expose MCP servers directly without security layers

Cross-References

MCP tools enable Day 1: Agent Architecture think-act-observe loop
Tool design prepares for Day 3: Memory RAG patterns
Tool quality measured in Day 4: Agent Quality observability
Production MCP deployment covered in Day 5: Production

External Resources:

Whitepaper: https://www.kaggle.com/whitepaper-agent-tools-and-interoperability-with-mcp
5-Day AI Agents Intensive: https://rsvp.withgoogle.com/events/google-ai-agents-intensive_2025
Google ADK Docs: https://google.github.io/adk-docs/
Kaggle Discord: http://discord.gg/kaggle

Day 3: Context Engineering: Sessions & Memory

Video URL: https://www.youtube.com/watch?v=8o-GXj8A3nE Duration: ~60 minutes

📖 Description

This is Day 3 of Kaggle’s 5-Day AI Agents Intensive Course, focusing on Context Engineering: Sessions & Memory. The livestream features expert speakers from Google (Stephen Johnson from NotebookLM, Julia Visinger, Kimberly from Agent Engine Memory Bank) and Jay Alamar from Cohere, covering advanced techniques for building stateful AI agents with persistent memory.

Key Topics Covered:

Short-term memory: Sessions and conversation management
Long-term memory: Persistent knowledge storage and retrieval
Context compaction strategies (summarization, truncation)
Active memory ETL pipeline (extract, consolidate, merge)
Memory systems: Declarative vs procedural memory
RAG vs memory: Dynamic, user-specific knowledge
Production considerations: Non-blocking operations, PII redaction

🎯 Learning Objectives

By the end of this video, you will understand:

How to build stateful agents that remember across conversations
The difference between sessions (short-term) and memory (long-term)
Context compaction strategies to prevent context window overflow
Memory ETL pipeline: extraction, consolidation, and deduplication
Declarative vs procedural memory patterns
How to implement sessions and memory in ADK (Agent Development Kit)
Production best practices: async memory generation, PII handling
Hybrid retrieval approaches: vector DB + knowledge graphs + re-ranking
Context caching for cost and latency optimization

📋 Curriculum/Contents

Part 1: Introduction & Context (0:00-15:00)

Course overview and community highlights
Day 3 focus: Context engineering, sessions, and memory
White paper walkthrough by Kimberly

Part 2: Expert Panel Q&A (15:00-45:00)

Speakers:

Stephen Johnson - NotebookLM founder, Editorial Director
Julia Visinger - PM at Google, ADK ecosystem
Kimberly Milm - Tech Lead, Agent Engine Memory Bank
Jay Alamar - Co-author “Hands-On Large Language Models”, Director at Cohere

Key Topics:

NotebookLM’s context management - Full 1M token window, RAG with alternate queries
ADK’s memory features - Turn instructions, context caching, static vs dynamic context
Hybrid memory systems - Vector DB + knowledge graphs + re-ranking
Memory quality control - Provenance, strict definitions, prompt injection defense
Cost vs context tradeoffs - Truncation vs summarization vs long-term memory
Narrative structure for state - Organizing experiences chronologically

Part 3: Code Labs Walkthrough (45:00-60:00)

Notebook 1: Sessions - Building stateful agents, in-memory vs database session services
Notebook 2: Memory - Long-term memory implementation, ETL pipeline
Practical implementation in ADK

📝 Notes & Key Takeaways

Main Insights

1. Sessions vs Memory - The Two-Tier System

Sessions = Short-term workbench (immediate conversation history)
Memory = Long-term filing cabinet (persistent knowledge across sessions)
LLMs are stateless by default - memory requires active context engineering

2. Context Rot Prevention

Problem: Finite context windows + growing conversation history = overflow
Solutions:
- Recursive summarization (condense history with LLM)
- Token-based truncation (discard older turns)
- Context caching (store + inject, faster + cheaper)
- Long-term memory extraction (save facts, discard noise)

3. Memory ETL Pipeline (Extract-Transform-Load)

Extract: Pull meaningful facts from noisy dialogue using LLM
Consolidate: Resolve conflicts, merge duplicates, update existing knowledge
Store: Persist in structured format (vector DB, knowledge graph, hybrid)

4. ADK’s Three-Layer Context Model

Static Instruction: Core identity, system prompts, safety policies (cached)
User Message: End user’s input (kept clean for logging/evals)
Turn Instruction: Dynamic per-turn steering from application backend

5. Memory Quality Controls

Provenance: Defer to high-trust data sources (CRM, human agents > inferred preferences)
Strict Definitions: Define exactly what to save (customization in Memory Bank)
Prompt Injection Defense: Use Model Armor to detect malicious inputs
Uncertainty Acknowledgment: Instruct LLM that memories are inferred, use with caution

6. Hybrid Retrieval > Pure Vector Search

Vector DB alone is insufficient
Add keyword search + re-rankers + knowledge graphs
Graph databases excel at relationship-based retrieval (useful for memory consolidation)

7. Production Best Practices

Memory generation = non-blocking background operation (avoid latency)
Redact PII rigorously (maintain user trust)
Memory as a tool (agent decides when to retrieve/store, not always-on)

8. NotebookLM’s Agentic Future

Context as UI: Source selection = explicit context control
Agent suggests tools: “You should create an audio overview on this topic”
Featured notebook = expert on how to use NotebookLM (meta-agent pattern)

Actionable Points

For Builders:

✅ Start with in-memory sessions for prototyping, migrate to database sessions for production
✅ Implement memory as a non-blocking background process (don’t block user requests)
✅ Use ADK’s turn instructions for dynamic context injection
✅ Test context caching to reduce costs (reuse static instructions)
✅ Define strict memory schemas (what to save, what to ignore)
✅ Build hybrid retrieval: vector + keyword + re-ranking
✅ Load white papers into NotebookLM for mind maps and video overviews

For Learners:

✅ Complete Day 3 codelabs:
- https://www.kaggle.com/code/kaggle5daysofai/day-3a-agent-sessions
- https://www.kaggle.com/code/kaggle5daysofai/day-3b-agent-memory
✅ Read white paper: https://www.kaggle.com/whitepaper-context-engineering-sessions-and-memory
✅ Listen to podcast: https://www.youtube.com/watch?v=FMcExVE15a4
✅ Explore NotebookLM’s mind map feature for visualizing white papers
✅ Join Kaggle Discord for community support: http://discord.gg/kaggle

For Capstone Projects:

✅ Experiment with graph RAG for human domain knowledge
✅ Build narrative-based state management (chronological organization)
✅ Test different compaction strategies (truncation vs summarization vs memory)
✅ Implement memory customization (define strict schemas)

Cross-References

Builds on Day 1: Agent Architecture concepts
Memory connects to Day 2: MCP Tools for retrieval patterns
Context management essential for Day 4: Agent Quality evaluation
Production memory systems detailed in Day 5: Production

NotebookLM Features Highlighted:

Mind maps for white paper visualization
Video overviews (new feature)
Audio overviews (AI podcast generation)
Source selection for focused context
Full 1M token context window
RAG with alternate query generation

ADK Features Highlighted:

In-memory session service (prototyping)
Database session service (production)
Vertex AI Agent Engine (enterprise)
Static/turn/user instruction layers
Context caching
Memory Bank integration
Tool-based memory (agent-controlled retrieval)

Day 4: Agent Quality

Video URL: https://www.youtube.com/watch?v=JW1Yybfxyr4 Duration: ~1 hour Uploaded: 2025-10-30

📹 Video Overview

Day 4: Agent Quality - A comprehensive livestream covering the critical topic of AI agent quality, including observability, evaluation frameworks, and production-ready practices from Google’s Kaggle 5-Day AI Agents Intensive Course.

🎯 Description

All course info at: https://www.kaggle.com/learn-guide/5-day-agents

Day 4 Focus: Agent Quality

Complete Unit 4:

Summary podcast episode: https://www.youtube.com/watch?v=LFQRy-Ci-lk
Agent Quality whitepaper: https://www.kaggle.com/whitepaper-agent-quality
Codelabs on Kaggle:
- Day 4a: Agent observability - https://www.kaggle.com/code/kaggle5daysofai/day-4a-agent-observability
- Day 4b: Agent evaluation - https://www.kaggle.com/code/kaggle5daysofai/day-4b-agent-evaluation
- Troubleshooting guide - https://www.kaggle.com/code/kaggle5daysofai/day-0-troubleshooting-and-faqs

Bonus: “Building GPU-Accelerated Data Science Agents” live on Kaggle: https://www.kaggle.com/code/jiweiliu/gpu-accelerated-data-science-agent

Community: Join Discord - http://discord.gg/kaggle

🎓 Learning Objectives

By the end of this session, you will understand:

Four Pillars of Agent Quality: Effectiveness, Efficiency, Robustness, Safety
Strategic Evaluation Hierarchy: Blackbox (outside-in) vs Glassbox (inside-out) views
Deep Observability Trinity: Structured logs, end-to-end traces, aggregated metrics
LLM as Judge: Scalable automated evaluation with human-in-the-loop
Agent Quality Flywheel: Turning production interactions into continuous improvement data
Multi-Agent Evaluation: Unique challenges in evaluating multi-agent systems
Production Observability: Implementing observability with ADK plugins and callbacks
GPU-Accelerated Data Science Agents: Using NVIDIA cuDF for fast data processing

📚 Curriculum & Key Topics

Part 1: Agent Quality Framework (Whitepaper Overview)

Core Framework - 4 Pillars:

Effectiveness - Did it achieve its goal?
Efficiency - Cost and speed optimization
Robustness - Graceful error handling
Safety - Adherence to ethical guidelines

Strategic Hierarchy:

Outside-in (Blackbox): Validate final outcomes
Inside-out (Glassbox): Debug reasoning trajectory
Why: Agent can get the right answer for wrong reasons - journey matters as much as outcome

Deep Observability Trinity:

Structured logs - Raw facts
End-to-end traces - Chain of thought
Aggregated metrics - Health trends

Part 2: Expert Panel Q&A Highlights

LLM as Judge - Four Critical Biases

Preference Bias - Models prefer their own generations
Verbosity Bias - Favor long, confident answers
Sycophancy - Agents agree with each other’s pushback
Score Bias - Models hedge bets (always score 5/10)

Solution: Evaluate your evaluators with test sets and human correlation

Multi-Agent System Evaluation

Key Challenge: Best components ≠ best system (soccer team analogy)

Best Practices:

Evaluate interactions & orchestration (not just individual agents)
Build evaluation from Day Zero
Use multi-layered approach: Metrics + LLM as Judge + Human-in-the-Loop
Error compounds rapidly (10% per agent)

Tools: Google ADK, Vertex AI eval service, OpenTelemetry

Part 3: GPU-Accelerated Data Science Agents (Dr. Jay - NVIDIA)

NVIDIA cuDF: Zero-Code GPU Acceleration

import cudf.pandas
cudf.pandas.install()
# All pandas code below now GPU-accelerated!

Performance: 10 seconds (CPU) → 1 second (GPU) on 2-5GB datasets

Multi-Agent Architecture:

Planner - Break down tasks
Coder - Write/execute code, generate visualizations
Vision - Interpret visualizations
Writer - Combine insights, write report

Demos:

Interactive data exploration (42M rows, live on Kaggle GPU)
Full research report generation (NVIDIA NeMo Super 49B)

Part 4: Code Labs - Agent Observability

Three Pillars:

Logs - What happened at specific time
Traces - Connect logs into cohesive story
Metrics - Average latency, failure rate

Demo: Research Paper Finder Agent

Debugging with ADK Web UI
Root cause analysis via trace inspection
Production implementation with plugins & callbacks

💡 Main Insights & Key Takeaways

Agent Quality is Multi-Dimensional - 4 Pillars + 2 Views + 3 Observability Components
LLM as Judge Has Biases - Mitigation: Rubrics, evaluator evaluation, human correlation
Multi-Agent Unique - Orchestration critical, error compounds, Day Zero evaluation
Production Infrastructure - OpenTelemetry, seamless experimentation → production
GPU Transforms Data Science - cuDF 10x speedup, quantized models, multi-agent workflows
Evaluation is Continuous - Not perfection, understanding + improvement
Mindset Shift - Agentic ≠ traditional development, nondeterministic, probabilistic

🎯 Actionable Next Steps

Complete Day 4a codelab: Agent observability
Complete Day 4b codelab: Agent evaluation
Read Agent Quality whitepaper
Try GPU-accelerated data science agent notebook
Implement rubrics for LLM as judge
Design observability from Day Zero in new projects

Cross-References

Quality framework builds on Day 1: Agent Architecture
Tool validation requires Day 2: MCP understanding
Observability requires Day 3: Memory for state tracking
Production quality detailed in Day 5: Production

🔗 Resources

Course Materials:

Course guide: https://www.kaggle.com/learn-guide/5-day-agents
Day 4 podcast: https://www.youtube.com/watch?v=LFQRy-Ci-lk
Whitepaper: https://www.kaggle.com/whitepaper-agent-quality

Code Labs:

Day 4a Observability: https://www.kaggle.com/code/kaggle5daysofai/day-4a-agent-observability
Day 4b Evaluation: https://www.kaggle.com/code/kaggle5daysofai/day-4b-agent-evaluation
GPU Demo: https://www.kaggle.com/code/jiweiliu/gpu-accelerated-data-science-agent

Tools:

Google ADK (Agent Development Kit)
NVIDIA NeMo Agent Toolkit
NVIDIA cuDF (GPU pandas)
OpenTelemetry

Day 5: Prototype to Production

Video URL: https://www.youtube.com/watch?v=8Wyt9l7ge-g Duration: 19 minutes

📖 Description

This whitepaper provides a comprehensive technical guide to the operational life cycle of AI agents, focusing on deployment, scaling, and productionizing. Building on Day 4’s coverage of evaluation and observability, this guide emphasizes how to build the necessary trust to move agents into production through robust CI/CD pipelines and scalable infrastructure. It explores the challenges of transitioning agent-based systems from prototypes to enterprise-grade solutions, with special attention to Agent2Agent (A2A) interoperability. This guide offers practical insights for AI/ML engineers, DevOps professionals, and system architects.

🎯 Learning Objectives

By the end of this video, you will understand:

The operational life cycle of AI agents from prototype to production (“AgentOps”)
Why 80% of development effort is infrastructure, security, and validation (not core AI)
The three foundational pillars: automated evaluation, CI/CD pipelines, and observability
How to implement evaluation-gated deployment with progressive funnel approach
Security challenges unique to AI agents (prompt injection, memory poisoning, data leakage)
The three-layer defense system: policy definition, guardrails, continuous assurance
Operational control strategies: decoupled state, caching, retries, cost management
Agent-to-Agent (A2A) interoperability vs Model Context Protocol (MCP)
Building collaborative AI agent ecosystems at enterprise scale

📋 Curriculum/Contents

Part 1: The Production Gap Challenge (0:00-3:00)

The “last mile” problem: demo to production
Why traditional MLOps doesn’t work for autonomous agents
Dynamic tool orchestration challenges
State management at scale
Unpredictable cost and latency issues

Part 2: People & Process First (3:00-6:00)

Team structure for generative AI
New specialized roles: Prompt Engineers and AI Engineers
Cross-team coordination requirements
Governance and responsibility frameworks

Part 3: Pre-Production Pipeline (6:00-10:00)

Evaluation-gated deployment principle
Manual pre-validation vs automated in-pipeline gates
The three-phase progressive funnel:
- Phase 1: Pre-merge integration (CI)
- Phase 2: Post-merge validation & staging (CD)
- Phase 3: Gated production deployment
Safe rollout strategies: Canary, Blue-Green, A/B testing
Version control: code, prompts, schemas, memory structure

Part 4: Security Framework (10:00-12:00)

Google’s Secure AI Agents approach (SIF-based)
Layer 1: Policy definition (agent constitution)
Layer 2: Guardrails and filtering (input/output, HITL)
Layer 3: Continuous assurance (red teaming, safety testing)
Common threats: prompt injection, data leakage, memory poisoning

Part 5: Operational Loop (12:00-16:00)

Observe: Logs, traces, metrics (the three pillars)
Act: Decoupled state, caching, retries, cost optimization
Evolve: Production-driven improvement cycle
Security response playbook: containment, triage, resolution
Rapid iteration through automated CI/CD

Part 6: Agent Interoperability (16:00-19:00)

Breaking down agent silos
Model Context Protocol (MCP): stateless tool interaction
Agent-to-Agent (A2A): stateful goal-oriented collaboration
Agent discovery: agent cards and registries
Distributed tracing across multi-agent systems
Building collaborative AI ecosystems

📝 Notes & Key Takeaways

Main Insights

80% of effort is infrastructure, not AI: The shocking reality is that building the core AI model is just 20% of the work. The remaining 80% is infrastructure, security, monitoring, and validation systems that make it production-ready.
Evaluation-gated deployment is non-negotiable: No agent should touch real users until it passes rigorous automated checks. This requires building golden datasets and automated evaluation suites that run in the CI/CD pipeline.
Agents need different MLOps than models: Traditional ML models are predictable (Input X → Output Y). Agents are dynamic, stateful, and autonomous, requiring new approaches to testing, versioning, and monitoring.
Three-layer security defense: (1) Policy definition through system instructions, (2) Hard guardrails with input/output filtering and HITL escalation, (3) Continuous assurance through red teaming and safety testing.
Decouple logic from state for scalability: Store memory and session data externally (Firestore, Cloud SQL) so agent logic can scale horizontally without state bottlenecks.
Observability requires logs + traces + metrics: Logs give you the diary, traces give you the narrative (causal chain), metrics give you the report card. You need all three for effective agent monitoring.
MCP vs A2A serve different purposes: MCP is for stateless tool interactions (“fetch the weather”). A2A is for stateful agent collaboration (“analyze customer churn and suggest strategies”). Often used together.
Velocity is the ultimate prize: Good AgentOps means deploying meaningful improvements in hours or days, not weeks or months. Continuous evolution is the competitive advantage.

Actionable Points

Start with fundamentals: Build a solid evaluation dataset and basic CI/CD pipeline with automated evaluation gates before anything else.
Implement progressive funnel: Use the three-phase approach (pre-merge CI, staging validation, gated production) to catch errors early and cheaply.
Use safe rollout strategies: Never deploy to 100% of users at once. Use canary releases (1% traffic), blue-green deployments (instant rollback), or A/B testing.
Version everything: Code, prompts, tool schemas, memory structure all need version numbers for instant rollback capability.
Turn production failures into tests: When something breaks in production, immediately add it to your golden evaluation dataset so it becomes part of future testing.
Build security response playbook: Pre-define containment procedures (circuit breakers, feature flags), triage routing (HITL queues), and rapid patch deployment workflows.
Implement distributed tracing: When agents collaborate, you need unique IDs that follow requests across services to understand the full causal chain.
Design idempotent tools: Tools involved in state changes must be safely retryable. “Get weather” can retry; “charge credit card” cannot without careful design.

Cross-References

Production deployment of Day 1: Agent Architecture systems
Securing Day 2: MCP tool integrations
Operationalizing Day 3: Memory at scale
Implementing Day 4: Agent Quality observability in production

🔗 Further Resources

Related Searches:

“AI agent CI/CD pipelines”
“Agent-to-Agent (A2A) interoperability”
“Model Context Protocol (MCP) implementation”
“AI agent security guardrails”
“AgentOps vs MLOps differences”
“Distributed tracing for multi-agent systems”
“Vertex AI safety filters”
“Red teaming AI agents”

Further Resources:

Summary & Key Takeaways

Complete Course Arc

This 5-day intensive course takes you through a comprehensive journey from understanding what AI agents are to deploying them at enterprise scale:

Day 1 establishes the foundation: agent architecture (model + tools + orchestration), taxonomy levels, and the paradigm shift from coding to directing agents
Day 2 connects agents to the real world through MCP tools and interoperability, covering tool design best practices
Day 3 builds stateful agents through sessions and memory, enabling continuity across conversations
Day 4 ensures quality through observability (logs/traces/metrics) and evaluation frameworks
Day 5 brings it all together with production deployment, security, and AgentOps

Universal Principles Across All Days

The Agent Trinity

Model (Brain): LLM for reasoning and decision-making
Tools (Hands): MCP-enabled actions and information retrieval
Orchestration (Nervous System): Think-act-observe loop with memory

The Quality Pillars

Effectiveness: Does it achieve the goal?
Efficiency: Cost and speed optimization
Robustness: Graceful error handling
Safety: Ethical guidelines and security

The Production Essentials

Evaluation: Automated golden datasets + LLM as Judge + Human-in-the-Loop
Observability: Logs + Traces + Metrics
Security: Policy definition + Guardrails + Continuous assurance
Interoperability: MCP for tools, A2A for agents

Critical Mindset Shifts

From Traditional Development:

Deterministic → Probabilistic
Coding → Directing
Perfect → Good enough with continuous improvement
Static → Dynamic and self-evolving

From Traditional MLOps:

Model-centric → System-centric
Batch predictions → Real-time interactions
Input-output pairs → Multi-step reasoning
Versioning models → Versioning everything (code, prompts, schemas, memory)

The Agent Development Lifecycle

Research → Prototype → Evaluate → Deploy → Observe → Improve
    ↑                                                     ↓
    └─────────────── Continuous Evolution ───────────────┘

Research Phase: Understand domain, design agent architecture, choose tools
Prototype Phase: Build with ADK, implement sessions/memory, integrate MCP tools
Evaluate Phase: Create golden datasets, implement automated evaluation, test robustness
Deploy Phase: CI/CD with evaluation gates, progressive rollout, security layers
Observe Phase: Monitor logs/traces/metrics, detect drift, capture edge cases
Improve Phase: Turn failures into tests, optimize prompts, evolve capabilities

Technology Stack Summary

Core Platforms:

Google Agent Development Kit (ADK)
Vertex AI Agent Builder
Vertex AI Agent Engine

Memory & Context:

In-memory sessions (prototyping)
Database sessions (production)
Memory Bank (long-term storage)
Context caching (cost optimization)
RAG systems (retrieval)

Tool Integration:

Model Context Protocol (MCP)
Agent-to-Agent (A2A) protocol
LangChain/LangGraph/CrewAI compatibility
Enterprise API gateways

Quality & Operations:

OpenTelemetry (observability)
Vertex AI eval service
CI/CD pipelines
Blue-Green/Canary deployments

Acceleration:

NVIDIA cuDF (GPU pandas)
NVIDIA NeMo Agent Toolkit
Prompt caching near accelerators

Real-World Use Cases Highlighted

DoorDash: Voice agents for hands-free policy lookup while holding packages
Pharmaceutical: Database lock → FDA abstract generation automation
Data Science: 42M row datasets analyzed with GPU-accelerated multi-agent systems
Research: Deep research agents for comprehensive literature reviews
Enterprise Workflows: Clinical trials → FDA submissions fully automated

Common Pitfalls & Solutions

Pitfall: Context window overflow from growing conversation history Solution: Recursive summarization + context caching + long-term memory extraction

Pitfall: LLM as Judge biases (verbosity, preference, sycophancy) Solution: Use rubrics, evaluate evaluators, maintain human correlation

Pitfall: Multi-agent error compounding (10% per agent) Solution: Evaluate orchestration, not just individual agents

Pitfall: Confused deputy security problem (privilege escalation via prompt injection) Solution: External API gateways with authentication/authorization/rate limiting

Pitfall: Tool context bloat (1000s of tool definitions) Solution: RAG for tools—semantic search for top 3-5 relevant tools

Pitfall: 80% of effort on infrastructure vs 20% on AI Solution: Use ADK, leverage existing platforms, don’t rebuild from scratch

What Makes a Great Agent

Clear Purpose: Well-defined tasks with measurable success criteria
Reliable Tools: Task-focused, well-documented, concise output, instructive errors
Robust Memory: Appropriate retention (sessions for short-term, memory for long-term)
Continuous Evaluation: Golden datasets, automated testing, drift monitoring
Graceful Failures: Uncertainty acknowledgment, HITL escalation, retry logic
Security First: Policy definition, guardrails, continuous red teaming
Observable Behavior: Structured logs, end-to-end traces, aggregated metrics
Rapid Evolution: CI/CD automation, version control, production feedback loops

Complete Resource List

📚 Course Materials

Main Course Hub:

Course Guide: https://www.kaggle.com/learn-guide/5-day-agents
Discord Community: http://discord.gg/kaggle

Day 1: Introduction to Agents:

Video: https://www.youtube.com/watch?v=ZaUcqznlhv8
White Paper: https://www.kaggle.com/whitepaper-introduction-to-agents

Day 2: Agent Tools & MCP:

Video: https://www.youtube.com/watch?v=Cr4NA6rxHAM
White Paper: https://www.kaggle.com/whitepaper-agent-tools-and-interoperability-with-mcp

Day 3: Context Engineering:

Video: https://www.youtube.com/watch?v=8o-GXj8A3nE
Podcast: https://www.youtube.com/watch?v=FMcExVE15a4
White Paper: https://www.kaggle.com/whitepaper-context-engineering-sessions-and-memory
Codelab 3a Sessions: https://www.kaggle.com/code/kaggle5daysofai/day-3a-agent-sessions
Codelab 3b Memory: https://www.kaggle.com/code/kaggle5daysofai/day-3b-agent-memory

Day 4: Agent Quality:

Video: https://www.youtube.com/watch?v=JW1Yybfxyr4
Podcast: https://www.youtube.com/watch?v=LFQRy-Ci-lk
White Paper: https://www.kaggle.com/whitepaper-agent-quality
Codelab 4a Observability: https://www.kaggle.com/code/kaggle5daysofai/day-4a-agent-observability
Codelab 4b Evaluation: https://www.kaggle.com/code/kaggle5daysofai/day-4b-agent-evaluation
GPU Demo: https://www.kaggle.com/code/jiweiliu/gpu-accelerated-data-science-agent

Day 5: Prototype to Production:

Video: https://www.youtube.com/watch?v=8Wyt9l7ge-g
White Paper: https://www.kaggle.com/whitepaper-prototype-to-production

Troubleshooting:

Day 0 FAQ: https://www.kaggle.com/code/kaggle5daysofai/day-0-troubleshooting-and-faqs

🛠️ Tools & Platforms

Google/Kaggle:

Google ADK Documentation: https://google.github.io/adk-docs/
Vertex AI Agent Builder: https://cloud.google.com/vertex-ai/agents
NotebookLM: https://notebooklm.google.com/

NVIDIA:

NVIDIA NeMo Agent Toolkit: https://www.nvidia.com/en-us/ai-data-science/products/nemo/
NVIDIA cuDF (GPU pandas): https://docs.rapids.ai/api/cudf/stable/

Frameworks & Protocols:

Model Context Protocol (MCP): https://modelcontextprotocol.io/
LangChain: https://www.langchain.com/
LangGraph: https://github.com/langchain-ai/langgraph
CrewAI: https://www.crewai.com/
OpenTelemetry: https://opentelemetry.io/

📖 Recommended Reading

Books:

“Hands-On Large Language Models” by Jay Alamar (O’Reilly)
“Illustrated Guide to AI Agents” by Jay Alamar (upcoming)
“Agentic Design Pattern” by Antonio G

Articles & Papers:

“Multi-agent design” (Google article)
Original Google Agents White Paper by Julia and Patrick
SIF-based Secure AI Agents approach

🎓 Recommended Learning Path

For Absolute Beginners:

Day 1 video → White paper → Podcast
Experiment with NotebookLM (load white papers, create audio overviews)
Day 2 podcast → Understand tool design principles
Day 3 video → Complete both codelabs (sessions + memory)
Day 4 video → Complete observability codelab
Day 5 podcast → Learn production deployment

For Developers:

Day 1 video (focus on ADK design principles)
Day 2 → Implement a simple MCP tool
Day 3 codelabs → Build a stateful agent
Day 4 codelabs → Add observability and evaluation
Day 5 → Deploy with CI/CD

For Enterprise Architects:

All Day 1 content (strategy and vision)
Day 2 (MCP security considerations)
Day 3 expert panel (memory at scale)
Day 4 expert panel (multi-agent evaluation)
Day 5 (AgentOps and production deployment)

For Data Scientists:

Day 1 video
Day 4 GPU demo (NVIDIA cuDF)
Day 3 memory systems (RAG patterns)
Day 4 evaluation frameworks
Build a data science agent with multi-agent architecture

🔍 Key Search Terms

Technical Concepts:

Agent Development Kit (ADK)
Model Context Protocol (MCP)
Agent-to-Agent (A2A) protocol
Think-act-observe loop
Retrieval-Augmented Generation (RAG)
Context engineering
Memory ETL pipeline
LLM as Judge
Confused deputy problem
Tool retrieval
AgentOps vs MLOps

Practical Topics:

Agentic design patterns
Multi-agent orchestration
Context window management
Prompt caching strategies
Evaluation-gated deployment
Progressive funnel approach
Canary/Blue-Green deployments
Distributed tracing
Memory consolidation
Hybrid retrieval systems

Security & Governance:

Prompt injection defense
Memory poisoning
PII redaction
Agent constitution
Guardrails and filtering
Red teaming AI agents
Enterprise API gateways
Audit trails for agents

Recommended Learning Path

Week 1: Foundations

Day 1-2: Watch Day 1 video, read white paper, listen to podcast
Day 3-4: Explore NotebookLM, create audio overview of white papers
Day 5: Join Discord, introduce yourself, review community projects

Week 2: Building Stateful Agents

Day 1-2: Watch Day 3 video, read white paper
Day 3: Complete Day 3a codelab (sessions)
Day 4: Complete Day 3b codelab (memory)
Day 5: Build your own simple agent with memory

Week 3: Quality & Evaluation

Day 1-2: Watch Day 4 video, read white paper
Day 3: Complete Day 4a codelab (observability)
Day 4: Complete Day 4b codelab (evaluation)
Day 5: Add evaluation to your Week 2 agent

Week 4: Tools & Integration

Day 1-2: Watch MCP whitepaper podcast, read full paper
Day 3-4: Design and implement a custom MCP tool
Day 5: Integrate your tool with your agent

Week 5: Production Deployment

Day 1-2: Watch Prototype to Production podcast, read full paper
Day 3-4: Set up CI/CD pipeline for your agent
Day 5: Deploy with evaluation gates and observability

Week 6: Capstone Project

Day 1-5: Build a complete production-ready agent:
- Multi-agent architecture
- MCP tool integration
- Sessions + long-term memory
- Automated evaluation
- CI/CD deployment
- Full observability (logs/traces/metrics)
- Security guardrails

🏷️ Unified Tag Analysis

Content Type:

video - Multiple YouTube videos and livestreams
comprehensive-guide - Complete course compilation

Topics:

AI - Core focus on artificial intelligence agents
learning - Educational course structure
development - Building agent systems with ADK
agents - Primary subject matter throughout
knowledge-management - Memory systems, context engineering
data-science - GPU-accelerated data science agents
tools - MCP, tool design, integration patterns
automation - Agent automation and orchestration
productivity - Velocity and efficiency gains

Complexity:

tutorial - Step-by-step learning curriculum
technical - Architecture, protocols, implementation
deep-dive - Comprehensive exploration of topics

Metadata:

inbox - Newly compiled comprehensive guide
high priority - Essential knowledge for modern AI agent development

Suggested Bases Filters:

type = comprehensive-guide AND tags contains "AI"
priority = high AND tags contains "agents"
tags contains "learning" AND tags contains "technical"
tags contains "AI" AND tags contains "development" AND tags contains "tutorial"

Compiled: 2025-11-17 Original Course Dates: 2025-11-11 to 2025-11-17 Source: Kaggle x Google 5-Day AI Agents Intensive Course Course Homepage: https://www.kaggle.com/learn-guide/5-day-agents

Connection to Other Notes:

Foundation for all AI agent development projects
Links to ADK framework documentation and implementations
Connects to MCP protocol specifications and tools
Relevant for MLOps, DevOps, and production system architecture
Essential reading for understanding modern agentic AI systems

Kaggle x Google 5-Day AI Agents Intensive Course - Complete Guide

📖 Overview

🗺️ Table of Contents

Course Structure

Additional Resources

Day 1: Introduction to Agents

📖 Description

🎯 Learning Objectives

📋 Curriculum/Contents

Introduction & Overview

White Paper Overview: “Introduction to Agents and Agentic Architectures”

Industry Expert Panel Q&A

Key Technologies & Concepts

📝 Notes & Key Takeaways

Main Insights

Actionable Points

Cross-References

Day 2: Agent Tools & Interoperability with MCP

📖 Description

🎯 Learning Objectives

📋 Curriculum/Contents

Part 1: Understanding Agent Tools (0:00-7:30)

Part 2: Tool Design Best Practices (7:30-12:00)

Part 3: Model Context Protocol (MCP) Architecture (12:00-16:30)

Part 4: Strategic Benefits & Challenges (16:30-19:00)

Part 5: Final Thoughts & Open Questions (19:00-20:00)

📝 Notes & Key Takeaways

Main Insights

Actionable Points

Cross-References

🔗 Related Resources

Day 3: Context Engineering: Sessions & Memory

📖 Description

🎯 Learning Objectives

📋 Curriculum/Contents

Part 1: Introduction & Context (0:00-15:00)

Part 2: Expert Panel Q&A (15:00-45:00)

Part 3: Code Labs Walkthrough (45:00-60:00)

📝 Notes & Key Takeaways

Main Insights

Actionable Points

Cross-References

Related Resources

Day 4: Agent Quality

📹 Video Overview

🎯 Description

🎓 Learning Objectives

📚 Curriculum & Key Topics

Part 1: Agent Quality Framework (Whitepaper Overview)

Part 2: Expert Panel Q&A Highlights

Part 3: GPU-Accelerated Data Science Agents (Dr. Jay - NVIDIA)

Part 4: Code Labs - Agent Observability

💡 Main Insights & Key Takeaways

🎯 Actionable Next Steps

Cross-References

🔗 Resources

Day 5: Prototype to Production

📖 Description

🎯 Learning Objectives

📋 Curriculum/Contents

Part 1: The Production Gap Challenge (0:00-3:00)

Part 2: People & Process First (3:00-6:00)

Part 3: Pre-Production Pipeline (6:00-10:00)

Part 4: Security Framework (10:00-12:00)

Part 5: Operational Loop (12:00-16:00)

Part 6: Agent Interoperability (16:00-19:00)

📝 Notes & Key Takeaways

Main Insights

Actionable Points

Cross-References

🔗 Further Resources

Summary & Key Takeaways

Complete Course Arc

Universal Principles Across All Days

The Agent Trinity

The Quality Pillars

The Production Essentials

Critical Mindset Shifts

The Agent Development Lifecycle

Technology Stack Summary