Fundamental Concepts of Generative AI Engineering (Kaggle x Google)
Goal: Learn how to use, evaluate, and ship GenAI systems: prompts, embeddings, RAG, agents, domain-specific LLMs, and MLOps on platforms like Vertex AI.
2.1 Foundational Models & GenAI Basics
Concept checklist
- Difference between:
- Foundation models vs task/domain-specific models
- LLMs vs small models vs multi-modal models
- Evolution of LLMs:
- Transformers → instruction tuning → RLHF / GRPO
- Inference acceleration & quantization (high level)
- Anatomy of a generative AI stack:
- Model layer (Gemini / Llama / etc.)
- Data & retrieval layer (vector DB, grounding)
- Application / orchestration layer (agents, tools, workflows)
2.2 Prompt Engineering
Prompt fundamentals
- Understand model as next-token predictor and why prompting matters
- How model, decoding config (temperature, top-p, top-k), and context window affect prompts
Prompt techniques
- Zero-shot prompting
- One-shot / few-shot prompting
- System prompting (global instructions)
- Role prompting (persona, style)
- Contextual prompting (background + docs)
- Step-back prompting (ask model to reframe problem)
- Chain-of-thought (CoT) prompting
- Self-consistency (sample multiple CoTs + aggregate)
- Tree-of-Thoughts prompting
- ReAct (reason + act) prompting
- Tool-aware prompts (explicit tool selection, function signatures)
- Safety-aware prompts (content boundaries, refusal behavior)
- Documentation template for prompts (goals, constraints, examples)
Prompt evaluation
- Structured outputs (JSON/enum schemas)
- Autoraters / model-based eval
- Pairwise prompt comparison (A/B)
- Human-in-the-loop review of prompt changes
2.3 Embeddings & Vector Databases
Embedding concepts
- What text embeddings are & why they’re used
- Vector space geometry: Cosine similarity, Euclidean distance
- Types of embedding tasks: Semantic search, Clustering, Classification, Recommendation
Vector databases & search
- Purpose of vector DBs vs classic SQL/NoSQL
- ANN search algorithms: FAISS, HNSW, ScaNN
- Index structures & tradeoffs (recall vs latency)
- Metadata filtering & hybrid search (vector + keyword)
Evaluating embeddings
- Precision@k, Recall@k, nDCG
- Qualitative inspection of neighbors
Hands-on skills
- Build a simple similarity search app over text docs
- Implement a small RAG Q&A system with embeddings
- Train a classifier on sentence embeddings
2.4 RAG & Grounding
RAG pipeline concepts
- Components: Ingestion (chunking), Indexing (vector DB), Retrieval (top-k), Reranking, Generation
- Grounding: Using external sources (e.g., Google Search) to reduce hallucinations
- RAG vs fine-tuning tradeoffs
RAG evaluation
- Compare grounded vs non-grounded outputs for Factuality, Recall, Clarity
- Use small test sets for End-to-end QA accuracy and Retrieval quality
2.5 Domain-Specific LLMs (SecLM, Med-PaLM)
Domain-specific challenges
- Cybersecurity: Scarcity of labeled data, evolving threats, severe consequences
- Medicine: Evolving knowledge, context-dependent reasoning, safety validation
Specialized models
- SecLM: Security-focused training, threat analysis, log triage
- Med-PaLM: Medical Q&A, multi-stage training, clinical evaluation
General domain-specific skills
- Distinguish task-specific vs domain-specific models
- Design a fine-tuning strategy: Data curation, Labeling, Safety review, Evaluation
2.6 AI Agents & Tool Use
Agent fundamentals
- What is an AI agent: LLM + tools + memory + policy/logic
- Goal-oriented, multi-step behavior
- Agents vs plain LLM calls
Agent architecture concepts
- Core components: Planner, Tool interface, Memory, Environment
- Types of agents: Task-oriented, Orchestrator, Multi-agent systems
Concrete skills
- Build an agent that uses function calling to talk to a SQL database
- Build a simple ordering / workflow agent with LangGraph
Reasoning patterns in agents
- ReAct loop (Reason + Act)
- Planning vs direct response
- Tool selection based on user goal
- Observability & logging
2.7 MLOps for Generative AI & AgentOps
GenAI lifecycle & MLOps
- Stages: Discovery, Prototyping, Deployment, Monitoring
- Adapting classic MLOps: Data/prompt versioning, CI/CD for prompts, Rollbacks
Vertex AI ecosystem
- Model Garden, Vertex AI Studio, Pipelines, Observability
AgentOps
- Difference between MLOps vs AgentOps
- Agent lifecycle: Design, Integration, Simulation, Production monitoring
- Safety and scope control: Limiting tools, Guardrails, Testing
Operational best practices
- Reliability: Timeouts, retries, circuit breakers
- Observability: Tracing, Logging prompts/responses
- Governance: Access control, Privacy, Audit trails