Forward Deployment Engineering Decisions and Tradeoffs
For a Forward-Deployed LLM / AI Engineer.
4.0 Global Decision Framework (Use This For Every Layer)
4.0.1 Clarify the problem
- User problem
- Functional requirements
- Non-functional requirements (Latency, Scale, Availability, Data sensitivity, Consistency, Regulatory)
- Constraints (Team, Time, Infra)
4.0.2 Enumerate options
- At least 2 realistic options.
- Describe architecture shape and tech stack for each.
4.0.3 Compare on axes
- Complexity, Performance, Reliability, Developer velocity, Cost, Risk.
4.0.4 Decide, document, and define guardrails
- Capture decision in ADR.
- Define validation metrics and revisit conditions.
4.1 Frontend & Web Stack Decisions
4.1.1 Language: JS vs TypeScript
- TypeScript (Default): Long-lived product, multiple engineers, safety.
- JavaScript: Throwaway prototype, single dev.
4.1.2 React vs other UI frameworks
- React (Default): Standardize on React given the ecosystem.
4.1.3 Next.js Rendering Mode
- CSR: Internal tools, heavy interactivity, SEO irrelevant.
- SSR: Personalized content, SEO important, dynamic data.
- SSG: Static content (docs, marketing), max speed.
- ISR: SSG speed + periodic updates (blogs, listings).
- Edge: Geo-aware, low latency personalization.
4.1.4 State Management & Data Fetching
- React Query / SWR: Server state (caching, retries).
- Context: Simple global state (theme, auth).
- Redux / Zustand: Complex global state, time-travel debugging.
4.1.5 Micro-Frontend vs Single Frontend
- Single Next.js (Default): Modular monolith.
- Micro-frontends: Only if multiple independent teams need independent deployment.
4.2 Mobile Decisions
4.2.1 Web app vs PWA vs Native
- Responsive Web: Occasional use, no native features.
- PWA: Installable, basic offline.
- Native/Hybrid: Push notifications, sensors, frequent use.
4.2.2 React Native vs Flutter vs Native
- React Native (Default): Share logic with web, JS/TS skills.
- Flutter: Pixel-perfect custom UI, willing to learn Dart.
- Native: Deep OS integration, max performance.
4.3 Backend & API Decisions
4.3.1 Language & Framework
- Python (FastAPI/Flask): ML/AI integrations, rapid iteration.
- Node.js: Strong JS team, real-time apps.
- Go: High performance, simple concurrency.
- Java/Kotlin: Enterprise ecosystem.
4.3.2 Monolith vs Microservices
- Monolith/Modular Monolith (Start here): Single deployment, clear boundaries.
- Microservices: Multiple teams, independent scaling needed.
4.3.3 API Style
- REST (Default): Resource-centric, broad compatibility.
- GraphQL: Complex client data needs, avoid over-fetching.
- gRPC: Internal service-to-service, performance.
- WebSockets: Real-time updates.
4.3.4 API Gateway & Intermediaries
- API Gateway: Central auth, rate limiting, routing.
- BFF: Tailored endpoints for specific clients.
4.4 Data & Storage Layer Decisions
4.4.1 DB Type
- Relational (Postgres): Strong consistency, complex relations.
- Document (Mongo): Flexible schema.
- KV Store (Redis): Caching, sessions.
- Search (Elasticsearch): Full-text search, analytics.
- Time-series: Metrics, logs.
4.4.2 Caching Strategy
- Browser -> CDN -> Reverse Proxy -> App Cache -> DB Cache.
4.5 Messaging & Async Decisions
4.5.1 Queue (RabbitMQ, SQS)
- Long-running tasks, decouple producer/consumer.
4.5.2 Event Stream (Kafka)
- Many consumers, replay capability, append-only data.
4.5.3 Sagas
- Distributed transactions across services.
4.6 Infra, DevOps & Deployment Decisions
4.6.1 PaaS vs Containers vs K8s vs Serverless
- PaaS (Render/Heroku): Early stage, focus on product.
- Containers (ECS/Cloud Run): More control, Dockerized.
- Kubernetes: Complex topology, flexible scaling, ops capability.
- Serverless: Event-driven, small functions.
4.6.2 Observability
- Logs (ELK/Loki), Metrics (Prometheus), Traces (OpenTelemetry).
- LLM Specifics: Log prompts, model versions, tool calls, user feedback.
4.7 ML & LLM Integration Decisions
4.7.1 Levels of Sophistication
- Prompt-only: Generic tasks, small POC.
- RAG: Grounding in docs/DB, frequent updates.
- Fine-tuned: Labeled examples, style/domain specialization, cost/latency optimization.
- Custom: Large scale, special architecture.
4.7.2 Hosted vs Self-Hosted
- Hosted (OpenAI/Anthropic): Best quality, simple ops.
- Self-Hosted (Llama/Qwen): Control, privacy, cost at scale.
4.8 Agentic System Decisions
4.8.1 Single vs Multi-Agent
- Single: Linear/simple tasks.
- Multi-agent: Complex tasks, specialized roles (Planner, Executor, Critic).
4.8.2 Autonomy Level
- Low (Assistive): Human approval required (High risk).
- High: End-to-end automation (Internal tools).
4.8.3 Tool Selection
- Few, powerful, safe tools > many tiny tools.
4.9 Cross-cutting Decisions
4.9.1 Security & Compliance
- Data classification, Encryption, Auth, Access Control, Residency.
4.9.2 Cost vs Time-to-Market vs Quality
- Time-to-Market: PaaS, Hosted LLMs, Monolith.
- Cost: Smaller models, Caching, Self-hosting.
- Quality: SOTA LLMs, Agents, RAG.
4.10 React Hooks & Frontend Concept Decisions
4.10.1 useState vs useReducer
- useState: Simple, local state.
- useReducer: Complex state, transitions.
4.10.2 useEffect
- Sync with external systems. Avoid for derived state.
4.10.3 useMemo & useCallback
- Expensive computations, stable callbacks for memoized children.
4.10.4 useRef
- Mutable values without re-renders (DOM refs, timers).
4.10.5 Component Patterns
- Container/Presenter, Custom Hooks, Compound Components.
4.11 Deployment Decisions (Cloud, Local, Services)
4.11.1 Cloud Provider Selection
- AWS: Industry standard, vast ecosystem. Best for enterprise scale.
- GCP: Strong AI/ML offerings (Vertex AI), Kubernetes (GKE) leadership.
- Azure: Enterprise integration, OpenAI partnership.
- Vercel/Netlify: Frontend-first, serverless, best DX for Next.js.
4.11.2 Local Deployment & Testing
- Docker Compose: Orchestrate multi-container apps locally.
- Minikube/Kind: Local Kubernetes testing.
- LocalStack: Mock AWS services locally.
- Ollama/LocalAI: Run LLMs locally for dev/test without API costs.
4.11.3 Service Mesh & Networking
- Istio/Linkerd: Traffic management, mTLS, observability in K8s.
- Nginx/Traefik: Ingress controllers, reverse proxies.
4.11.4 Content Delivery
- Cloudflare: CDN, DDoS protection, Edge workers.
- AWS CloudFront: Deep integration with S3/AWS services.
4E (Extended) – LLMs, Inference, Agents, Cloud & GPU Decisions
4E.1 LLM Choices
Hosted:
- GPT-4.1 / GPT-5: General assistants.
- o3-mini / reasoning models: Complex reasoning & planning.
Open-source:
- DeepSeek-R1 (and distilled variants): Reasoning.
- Qwen 2.5/3: Strong coding & reasoning.
- LLaMA, Mistral, Codestral: Good alternatives.
4E.2 Inference & Serving
Local dev:
- Ollama.
- llama.cpp.
- vLLM on Colab Pro for more serious tests.
Production:
- vLLM: Primary serving engine (OpenAI-compatible).
- TGI / SGLang: Alternatives.
4E.3 Agent & Orchestration Frameworks
- LangChain: Building blocks for LLM apps.
- LangGraph: Multi-step/multi-agent workflows as graphs.
- LlamaIndex: Strong for RAG/data indexing.
- AutoGen/CrewAI: Alternative multi-agent frameworks.
- OpenAI Assistants/Agents: API-native orchestration when using OpenAI stack.
4E.4 Vector Databases
- Local: Chroma, SQLite + pgvector.
- Small prod: Postgres + pgvector.
- Larger: Qdrant, Weaviate, Milvus.
- SaaS: Pinecone, Turbopuffer, etc.
4E.5 Observability for LLMs
- LangFuse / LangSmith: Tracing, evaluation.
- Prometheus + Grafana: Metrics.
- Loki/ELK: Logs.
- OpenTelemetry: End-to-end tracing.
4E.6 Cloud & GPU Hosting
GPU as a service:
- OpenAI/Anthropic: Fully managed.
- Modal/Replicate/Banana: Managed GPU containers.
- Runpod/Lambda/Vast.ai: Bare-metal GPU rental.
- AWS/GCP GPUs: Enterprise-level infra.