Fundamental Concepts of Generative AI Engineering (Kaggle x Google)

Goal: Learn how to use, evaluate, and ship GenAI systems: prompts, embeddings, RAG, agents, domain-specific LLMs, and MLOps on platforms like Vertex AI.

2.1 Foundational Models & GenAI Basics

Concept checklist

Difference between:
- Foundation models vs task/domain-specific models
- LLMs vs small models vs multi-modal models
Evolution of LLMs:
- Transformers → instruction tuning → RLHF / GRPO
- Inference acceleration & quantization (high level)
Anatomy of a generative AI stack:
- Model layer (Gemini / Llama / etc.)
- Data & retrieval layer (vector DB, grounding)
- Application / orchestration layer (agents, tools, workflows)

2.2 Prompt Engineering

Prompt fundamentals

Understand model as next-token predictor and why prompting matters
How model, decoding config (temperature, top-p, top-k), and context window affect prompts

Prompt techniques

Zero-shot prompting
One-shot / few-shot prompting
System prompting (global instructions)
Role prompting (persona, style)
Contextual prompting (background + docs)
Step-back prompting (ask model to reframe problem)
Chain-of-thought (CoT) prompting
Self-consistency (sample multiple CoTs + aggregate)
Tree-of-Thoughts prompting
ReAct (reason + act) prompting
Tool-aware prompts (explicit tool selection, function signatures)
Safety-aware prompts (content boundaries, refusal behavior)
Documentation template for prompts (goals, constraints, examples)

Prompt evaluation

Structured outputs (JSON/enum schemas)
Autoraters / model-based eval
Pairwise prompt comparison (A/B)
Human-in-the-loop review of prompt changes

2.3 Embeddings & Vector Databases

Embedding concepts

What text embeddings are & why they’re used
Vector space geometry: Cosine similarity, Euclidean distance
Types of embedding tasks: Semantic search, Clustering, Classification, Recommendation

Vector databases & search

Purpose of vector DBs vs classic SQL/NoSQL
ANN search algorithms: FAISS, HNSW, ScaNN
Index structures & tradeoffs (recall vs latency)
Metadata filtering & hybrid search (vector + keyword)

Evaluating embeddings

Precision@k, Recall@k, nDCG
Qualitative inspection of neighbors

Hands-on skills

Build a simple similarity search app over text docs
Implement a small RAG Q&A system with embeddings
Train a classifier on sentence embeddings

2.4 RAG & Grounding

RAG pipeline concepts

Components: Ingestion (chunking), Indexing (vector DB), Retrieval (top-k), Reranking, Generation
Grounding: Using external sources (e.g., Google Search) to reduce hallucinations
RAG vs fine-tuning tradeoffs

RAG evaluation

Compare grounded vs non-grounded outputs for Factuality, Recall, Clarity
Use small test sets for End-to-end QA accuracy and Retrieval quality

2.5 Domain-Specific LLMs (SecLM, Med-PaLM)

Domain-specific challenges

Cybersecurity: Scarcity of labeled data, evolving threats, severe consequences
Medicine: Evolving knowledge, context-dependent reasoning, safety validation

Specialized models

SecLM: Security-focused training, threat analysis, log triage
Med-PaLM: Medical Q&A, multi-stage training, clinical evaluation

General domain-specific skills

Distinguish task-specific vs domain-specific models
Design a fine-tuning strategy: Data curation, Labeling, Safety review, Evaluation

2.6 AI Agents & Tool Use

Agent fundamentals

What is an AI agent: LLM + tools + memory + policy/logic
Goal-oriented, multi-step behavior
Agents vs plain LLM calls

Agent architecture concepts

Core components: Planner, Tool interface, Memory, Environment
Types of agents: Task-oriented, Orchestrator, Multi-agent systems

Concrete skills

Build an agent that uses function calling to talk to a SQL database
Build a simple ordering / workflow agent with LangGraph

Reasoning patterns in agents

ReAct loop (Reason + Act)
Planning vs direct response
Tool selection based on user goal
Observability & logging

2.7 MLOps for Generative AI & AgentOps

GenAI lifecycle & MLOps

Stages: Discovery, Prototyping, Deployment, Monitoring
Adapting classic MLOps: Data/prompt versioning, CI/CD for prompts, Rollbacks

Vertex AI ecosystem

Model Garden, Vertex AI Studio, Pipelines, Observability

AgentOps

Difference between MLOps vs AgentOps
Agent lifecycle: Design, Integration, Simulation, Production monitoring
Safety and scope control: Limiting tools, Guardrails, Testing

Operational best practices

Reliability: Timeouts, retries, circuit breakers
Observability: Tracing, Logging prompts/responses
Governance: Access control, Privacy, Audit trails