Data & AI 90 days 2-3 hours/day updated 2026-06-01
LLMOps 90-Day Learning Path
Master LLMOps in 90 days: LLM evals and benchmarks, RAG pipeline architecture, guardrails, prompt versioning, token cost optimization, and fine-tuning ops for production AI applications.
What LLMOps means
LLMOps is the operational discipline for deploying and maintaining large language model applications in production. It extends MLOps with LLM-specific concerns: prompt engineering and versioning, retrieval-augmented generation (RAG) pipeline design, evaluation frameworks, output guardrails, token cost management, and fine-tuning pipelines. LLMOps practitioners ensure that LLM applications are reliable, cost-efficient, safe, and continuously improving.
Who should follow this path
- ML engineers deploying LLM-powered applications
- AI engineers building RAG pipelines and agent systems
- Platform engineers supporting LLM inference infrastructure
- Data scientists moving from model training to LLM fine-tuning
- Software engineers integrating LLM APIs into products
Prerequisites
- Python proficiency and API integration experience
- Basic understanding of transformer architecture and attention
- Familiarity with vector databases (Pinecone, Weaviate, or pgvector)
- Docker and Kubernetes fundamentals
- Experience with at least one LLM API (OpenAI, Anthropic, or Google)
The 90-day plan
Daily study recommendation: 2-3 hours/day, six days a week. Consistency beats intensity — block the time in your calendar like a meeting.
Days 1–15: Foundation
- LLM application architecture patterns
- Prompt engineering: zero-shot, few-shot, chain-of-thought
- Prompt versioning with PromptLayer or LangChain Hub
- Token budgeting and cost modeling
- LLM provider comparison: latency, cost, capability trade-offs
Outcome: Design a production LLM application with versioned prompts and a token cost budget.
Days 16–30: Core concepts
- RAG architecture: chunking, embedding, retrieval, reranking
- Vector databases: Pinecone, Weaviate, Chroma, pgvector
- Embedding models: OpenAI, Cohere, sentence-transformers
- Hybrid search: dense + sparse (BM25) retrieval
- RAG pipeline orchestration with LangChain or LlamaIndex
Outcome: Build a production-quality RAG pipeline with hybrid retrieval and reranking.
Days 31–45: Tools and workflows
- LLM evaluation frameworks: Ragas, LangSmith, DeepEval
- Evaluation metrics: faithfulness, answer relevancy, context recall
- Human-in-the-loop evaluation workflows
- Regression testing for LLM outputs
- Benchmark datasets: MMLU, HumanEval, TruthfulQA
Outcome: Implement an automated LLM evaluation pipeline with regression testing on every prompt change.
Days 46–60: Hands-on projects
- Guardrails: output validation with Guardrails AI and NVIDIA NeMo Guardrails
- PII detection and redaction in LLM pipelines
- Toxicity and hallucination detection
- Rate limiting and abuse prevention for LLM APIs
- Content safety policies and refusal handling
Outcome: Deploy safety guardrails that detect hallucinations, PII leakage, and policy violations in real-time.
Days 61–75: Advanced practices
- Fine-tuning pipelines: LoRA/QLoRA on open-source models
- Fine-tuning infrastructure: Axolotl, Hugging Face TRL
- LLM serving: vLLM, TGI (Text Generation Inference), Ollama
- Observability: LangSmith, Langfuse, Arize Phoenix tracing
- Token cost optimization: caching, batching, prompt compression
Outcome: Fine-tune an open-source LLM with LoRA and serve it at production scale with full observability.
Days 76–90: Portfolio, interview & certification prep
- LLMOps portfolio project: end-to-end RAG application
- Preparing for Hugging Face NLP course certification
- LLMOps interview questions and system design scenarios
- Metrics: eval scores, p50/p95 latency, cost-per-query, error rates
- Emerging: agentic AI ops, multi-agent orchestration, model routing
Outcome: Ship a portfolio RAG application with evals, guardrails, and observability and ace LLMOps interviews.
Weekly outcomes at a glance
| Phase | Outcome |
|---|---|
| Days 1–15 | Design a production LLM application with versioned prompts and a token cost budget. |
| Days 16–30 | Build a production-quality RAG pipeline with hybrid retrieval and reranking. |
| Days 31–45 | Implement an automated LLM evaluation pipeline with regression testing on every prompt change. |
| Days 46–60 | Deploy safety guardrails that detect hallucinations, PII leakage, and policy violations in real-time. |
| Days 61–75 | Fine-tune an open-source LLM with LoRA and serve it at production scale with full observability. |
| Days 76–90 | Ship a portfolio RAG application with evals, guardrails, and observability and ace LLMOps interviews. |
Tools to learn
- LangChain
- LlamaIndex
- Pinecone
- Weaviate
- LangSmith
- Ragas
- Guardrails AI
- vLLM
- Hugging Face TRL
- Langfuse
- PromptLayer
- Arize Phoenix
Labs to practice
Mini projects
- Build a production RAG pipeline with Pinecone + LangChain, Ragas evaluation, and Guardrails AI safety checks
- Fine-tune a Mistral 7B model with QLoRA using Axolotl, serve it with vLLM, and benchmark against GPT-4o
- Create an LLMOps observability stack with Langfuse tracing, cost dashboards, and automated eval regression tests
Interview questions to prepare
- What is RAG and when would you use it instead of fine-tuning?
- How do you evaluate the quality of a RAG pipeline?
- Explain the difference between faithfulness and relevancy in LLM evaluation.
- What is LoRA and why is it preferred over full fine-tuning in most cases?
- How do you prevent hallucination in production LLM applications?
- What strategies do you use to reduce token costs in a high-volume LLM API application?
- How would you design a prompt versioning system for a team of 10 engineers?
- What observability metrics would you track for a production LLM service?
Certification suggestions
- Hugging Face NLP Course Certificate — Hugging Face
- Databricks Generative AI Fundamentals — Databricks
- AWS Certified Machine Learning Specialty — AWS
- DeepLearning.AI LLMOps Specialization — DeepLearning.AI / Coursera
Browse the full certification registry for exam details and official links.
Free resources
- Hugging Face NLP Course (free)
- LangChain Documentation
- Ragas Documentation
- vLLM Documentation
- Langfuse Documentation
Related roadmaps
Related tool categories
// instructor-led option
Prefer live, guided training with mentors and certification support? DevOpsSchool.com runs paid instructor-led programs that pair well with this free path.
Explore paid training on DevOpsSchool.com ↗