Back to Engineering
4 min read
Updated recently

Fundamental Concepts of Generative AI Engineering

Learn how to use, evaluate, and ship GenAI systems: prompts, embeddings, RAG, agents, and MLOps.

Fundamental Concepts of Generative AI Engineering (Kaggle x Google)

Goal: Learn how to use, evaluate, and ship GenAI systems: prompts, embeddings, RAG, agents, domain-specific LLMs, and MLOps on platforms like Vertex AI.


2.1 Foundational Models & GenAI Basics

Concept checklist

  • Difference between:
    • Foundation models vs task/domain-specific models
    • LLMs vs small models vs multi-modal models
  • Evolution of LLMs:
    • Transformers → instruction tuning → RLHF / GRPO
    • Inference acceleration & quantization (high level)
  • Anatomy of a generative AI stack:
    • Model layer (Gemini / Llama / etc.)
    • Data & retrieval layer (vector DB, grounding)
    • Application / orchestration layer (agents, tools, workflows)

2.2 Prompt Engineering

Prompt fundamentals

  • Understand model as next-token predictor and why prompting matters
  • How model, decoding config (temperature, top-p, top-k), and context window affect prompts

Prompt techniques

  • Zero-shot prompting
  • One-shot / few-shot prompting
  • System prompting (global instructions)
  • Role prompting (persona, style)
  • Contextual prompting (background + docs)
  • Step-back prompting (ask model to reframe problem)
  • Chain-of-thought (CoT) prompting
  • Self-consistency (sample multiple CoTs + aggregate)
  • Tree-of-Thoughts prompting
  • ReAct (reason + act) prompting
  • Tool-aware prompts (explicit tool selection, function signatures)
  • Safety-aware prompts (content boundaries, refusal behavior)
  • Documentation template for prompts (goals, constraints, examples)

Prompt evaluation

  • Structured outputs (JSON/enum schemas)
  • Autoraters / model-based eval
  • Pairwise prompt comparison (A/B)
  • Human-in-the-loop review of prompt changes

2.3 Embeddings & Vector Databases

Embedding concepts

  • What text embeddings are & why they’re used
  • Vector space geometry: Cosine similarity, Euclidean distance
  • Types of embedding tasks: Semantic search, Clustering, Classification, Recommendation

Vector databases & search

  • Purpose of vector DBs vs classic SQL/NoSQL
  • ANN search algorithms: FAISS, HNSW, ScaNN
  • Index structures & tradeoffs (recall vs latency)
  • Metadata filtering & hybrid search (vector + keyword)

Evaluating embeddings

  • Precision@k, Recall@k, nDCG
  • Qualitative inspection of neighbors

Hands-on skills

  • Build a simple similarity search app over text docs
  • Implement a small RAG Q&A system with embeddings
  • Train a classifier on sentence embeddings

2.4 RAG & Grounding

RAG pipeline concepts

  • Components: Ingestion (chunking), Indexing (vector DB), Retrieval (top-k), Reranking, Generation
  • Grounding: Using external sources (e.g., Google Search) to reduce hallucinations
  • RAG vs fine-tuning tradeoffs

RAG evaluation

  • Compare grounded vs non-grounded outputs for Factuality, Recall, Clarity
  • Use small test sets for End-to-end QA accuracy and Retrieval quality

2.5 Domain-Specific LLMs (SecLM, Med-PaLM)

Domain-specific challenges

  • Cybersecurity: Scarcity of labeled data, evolving threats, severe consequences
  • Medicine: Evolving knowledge, context-dependent reasoning, safety validation

Specialized models

  • SecLM: Security-focused training, threat analysis, log triage
  • Med-PaLM: Medical Q&A, multi-stage training, clinical evaluation

General domain-specific skills

  • Distinguish task-specific vs domain-specific models
  • Design a fine-tuning strategy: Data curation, Labeling, Safety review, Evaluation

2.6 AI Agents & Tool Use

Agent fundamentals

  • What is an AI agent: LLM + tools + memory + policy/logic
  • Goal-oriented, multi-step behavior
  • Agents vs plain LLM calls

Agent architecture concepts

  • Core components: Planner, Tool interface, Memory, Environment
  • Types of agents: Task-oriented, Orchestrator, Multi-agent systems

Concrete skills

  • Build an agent that uses function calling to talk to a SQL database
  • Build a simple ordering / workflow agent with LangGraph

Reasoning patterns in agents

  • ReAct loop (Reason + Act)
  • Planning vs direct response
  • Tool selection based on user goal
  • Observability & logging

2.7 MLOps for Generative AI & AgentOps

GenAI lifecycle & MLOps

  • Stages: Discovery, Prototyping, Deployment, Monitoring
  • Adapting classic MLOps: Data/prompt versioning, CI/CD for prompts, Rollbacks

Vertex AI ecosystem

  • Model Garden, Vertex AI Studio, Pipelines, Observability

AgentOps

  • Difference between MLOps vs AgentOps
  • Agent lifecycle: Design, Integration, Simulation, Production monitoring
  • Safety and scope control: Limiting tools, Guardrails, Testing

Operational best practices

  • Reliability: Timeouts, retries, circuit breakers
  • Observability: Tracing, Logging prompts/responses
  • Governance: Access control, Privacy, Audit trails