Back to Engineering
6 min read
Updated recently

Forward Deployment Engineering Decisions and Tradeoffs

Granular decision framework for web apps, mobile apps, APIs, backends, ML, LLMs, and agents.

Forward Deployment Engineering Decisions and Tradeoffs

For a Forward-Deployed LLM / AI Engineer.


4.0 Global Decision Framework (Use This For Every Layer)

4.0.1 Clarify the problem

  • User problem
  • Functional requirements
  • Non-functional requirements (Latency, Scale, Availability, Data sensitivity, Consistency, Regulatory)
  • Constraints (Team, Time, Infra)

4.0.2 Enumerate options

  • At least 2 realistic options.
  • Describe architecture shape and tech stack for each.

4.0.3 Compare on axes

  • Complexity, Performance, Reliability, Developer velocity, Cost, Risk.

4.0.4 Decide, document, and define guardrails

  • Capture decision in ADR.
  • Define validation metrics and revisit conditions.

4.1 Frontend & Web Stack Decisions

4.1.1 Language: JS vs TypeScript

  • TypeScript (Default): Long-lived product, multiple engineers, safety.
  • JavaScript: Throwaway prototype, single dev.

4.1.2 React vs other UI frameworks

  • React (Default): Standardize on React given the ecosystem.

4.1.3 Next.js Rendering Mode

  • CSR: Internal tools, heavy interactivity, SEO irrelevant.
  • SSR: Personalized content, SEO important, dynamic data.
  • SSG: Static content (docs, marketing), max speed.
  • ISR: SSG speed + periodic updates (blogs, listings).
  • Edge: Geo-aware, low latency personalization.

4.1.4 State Management & Data Fetching

  • React Query / SWR: Server state (caching, retries).
  • Context: Simple global state (theme, auth).
  • Redux / Zustand: Complex global state, time-travel debugging.

4.1.5 Micro-Frontend vs Single Frontend

  • Single Next.js (Default): Modular monolith.
  • Micro-frontends: Only if multiple independent teams need independent deployment.

4.2 Mobile Decisions

4.2.1 Web app vs PWA vs Native

  • Responsive Web: Occasional use, no native features.
  • PWA: Installable, basic offline.
  • Native/Hybrid: Push notifications, sensors, frequent use.

4.2.2 React Native vs Flutter vs Native

  • React Native (Default): Share logic with web, JS/TS skills.
  • Flutter: Pixel-perfect custom UI, willing to learn Dart.
  • Native: Deep OS integration, max performance.

4.3 Backend & API Decisions

4.3.1 Language & Framework

  • Python (FastAPI/Flask): ML/AI integrations, rapid iteration.
  • Node.js: Strong JS team, real-time apps.
  • Go: High performance, simple concurrency.
  • Java/Kotlin: Enterprise ecosystem.

4.3.2 Monolith vs Microservices

  • Monolith/Modular Monolith (Start here): Single deployment, clear boundaries.
  • Microservices: Multiple teams, independent scaling needed.

4.3.3 API Style

  • REST (Default): Resource-centric, broad compatibility.
  • GraphQL: Complex client data needs, avoid over-fetching.
  • gRPC: Internal service-to-service, performance.
  • WebSockets: Real-time updates.

4.3.4 API Gateway & Intermediaries

  • API Gateway: Central auth, rate limiting, routing.
  • BFF: Tailored endpoints for specific clients.

4.4 Data & Storage Layer Decisions

4.4.1 DB Type

  • Relational (Postgres): Strong consistency, complex relations.
  • Document (Mongo): Flexible schema.
  • KV Store (Redis): Caching, sessions.
  • Search (Elasticsearch): Full-text search, analytics.
  • Time-series: Metrics, logs.

4.4.2 Caching Strategy

  • Browser -> CDN -> Reverse Proxy -> App Cache -> DB Cache.

4.5 Messaging & Async Decisions

4.5.1 Queue (RabbitMQ, SQS)

  • Long-running tasks, decouple producer/consumer.

4.5.2 Event Stream (Kafka)

  • Many consumers, replay capability, append-only data.

4.5.3 Sagas

  • Distributed transactions across services.

4.6 Infra, DevOps & Deployment Decisions

4.6.1 PaaS vs Containers vs K8s vs Serverless

  • PaaS (Render/Heroku): Early stage, focus on product.
  • Containers (ECS/Cloud Run): More control, Dockerized.
  • Kubernetes: Complex topology, flexible scaling, ops capability.
  • Serverless: Event-driven, small functions.

4.6.2 Observability

  • Logs (ELK/Loki), Metrics (Prometheus), Traces (OpenTelemetry).
  • LLM Specifics: Log prompts, model versions, tool calls, user feedback.

4.7 ML & LLM Integration Decisions

4.7.1 Levels of Sophistication

  • Prompt-only: Generic tasks, small POC.
  • RAG: Grounding in docs/DB, frequent updates.
  • Fine-tuned: Labeled examples, style/domain specialization, cost/latency optimization.
  • Custom: Large scale, special architecture.

4.7.2 Hosted vs Self-Hosted

  • Hosted (OpenAI/Anthropic): Best quality, simple ops.
  • Self-Hosted (Llama/Qwen): Control, privacy, cost at scale.

4.8 Agentic System Decisions

4.8.1 Single vs Multi-Agent

  • Single: Linear/simple tasks.
  • Multi-agent: Complex tasks, specialized roles (Planner, Executor, Critic).

4.8.2 Autonomy Level

  • Low (Assistive): Human approval required (High risk).
  • High: End-to-end automation (Internal tools).

4.8.3 Tool Selection

  • Few, powerful, safe tools > many tiny tools.

4.9 Cross-cutting Decisions

4.9.1 Security & Compliance

  • Data classification, Encryption, Auth, Access Control, Residency.

4.9.2 Cost vs Time-to-Market vs Quality

  • Time-to-Market: PaaS, Hosted LLMs, Monolith.
  • Cost: Smaller models, Caching, Self-hosting.
  • Quality: SOTA LLMs, Agents, RAG.

4.10 React Hooks & Frontend Concept Decisions

4.10.1 useState vs useReducer

  • useState: Simple, local state.
  • useReducer: Complex state, transitions.

4.10.2 useEffect

  • Sync with external systems. Avoid for derived state.

4.10.3 useMemo & useCallback

  • Expensive computations, stable callbacks for memoized children.

4.10.4 useRef

  • Mutable values without re-renders (DOM refs, timers).

4.10.5 Component Patterns

  • Container/Presenter, Custom Hooks, Compound Components.

4.11 Deployment Decisions (Cloud, Local, Services)

4.11.1 Cloud Provider Selection

  • AWS: Industry standard, vast ecosystem. Best for enterprise scale.
  • GCP: Strong AI/ML offerings (Vertex AI), Kubernetes (GKE) leadership.
  • Azure: Enterprise integration, OpenAI partnership.
  • Vercel/Netlify: Frontend-first, serverless, best DX for Next.js.

4.11.2 Local Deployment & Testing

  • Docker Compose: Orchestrate multi-container apps locally.
  • Minikube/Kind: Local Kubernetes testing.
  • LocalStack: Mock AWS services locally.
  • Ollama/LocalAI: Run LLMs locally for dev/test without API costs.

4.11.3 Service Mesh & Networking

  • Istio/Linkerd: Traffic management, mTLS, observability in K8s.
  • Nginx/Traefik: Ingress controllers, reverse proxies.

4.11.4 Content Delivery

  • Cloudflare: CDN, DDoS protection, Edge workers.
  • AWS CloudFront: Deep integration with S3/AWS services.

4E (Extended) – LLMs, Inference, Agents, Cloud & GPU Decisions

4E.1 LLM Choices

Hosted:

  • GPT-4.1 / GPT-5: General assistants.
  • o3-mini / reasoning models: Complex reasoning & planning.

Open-source:

  • DeepSeek-R1 (and distilled variants): Reasoning.
  • Qwen 2.5/3: Strong coding & reasoning.
  • LLaMA, Mistral, Codestral: Good alternatives.

4E.2 Inference & Serving

Local dev:

  • Ollama.
  • llama.cpp.
  • vLLM on Colab Pro for more serious tests.

Production:

  • vLLM: Primary serving engine (OpenAI-compatible).
  • TGI / SGLang: Alternatives.

4E.3 Agent & Orchestration Frameworks

  • LangChain: Building blocks for LLM apps.
  • LangGraph: Multi-step/multi-agent workflows as graphs.
  • LlamaIndex: Strong for RAG/data indexing.
  • AutoGen/CrewAI: Alternative multi-agent frameworks.
  • OpenAI Assistants/Agents: API-native orchestration when using OpenAI stack.

4E.4 Vector Databases

  • Local: Chroma, SQLite + pgvector.
  • Small prod: Postgres + pgvector.
  • Larger: Qdrant, Weaviate, Milvus.
  • SaaS: Pinecone, Turbopuffer, etc.

4E.5 Observability for LLMs

  • LangFuse / LangSmith: Tracing, evaluation.
  • Prometheus + Grafana: Metrics.
  • Loki/ELK: Logs.
  • OpenTelemetry: End-to-end tracing.

4E.6 Cloud & GPU Hosting

GPU as a service:

  • OpenAI/Anthropic: Fully managed.
  • Modal/Replicate/Banana: Managed GPU containers.
  • Runpod/Lambda/Vast.ai: Bare-metal GPU rental.
  • AWS/GCP GPUs: Enterprise-level infra.