Data & AI 90 days 2-3 hours/day updated 2026-06-01

LLMOps 90-Day Learning Path

Master LLMOps in 90 days: LLM evals and benchmarks, RAG pipeline architecture, guardrails, prompt versioning, token cost optimization, and fine-tuning ops for production AI applications.

What LLMOps means

LLMOps is the operational discipline for deploying and maintaining large language model applications in production. It extends MLOps with LLM-specific concerns: prompt engineering and versioning, retrieval-augmented generation (RAG) pipeline design, evaluation frameworks, output guardrails, token cost management, and fine-tuning pipelines. LLMOps practitioners ensure that LLM applications are reliable, cost-efficient, safe, and continuously improving.

Who should follow this path

ML engineers deploying LLM-powered applications
AI engineers building RAG pipelines and agent systems
Platform engineers supporting LLM inference infrastructure
Data scientists moving from model training to LLM fine-tuning
Software engineers integrating LLM APIs into products

Prerequisites

Python proficiency and API integration experience
Basic understanding of transformer architecture and attention
Familiarity with vector databases (Pinecone, Weaviate, or pgvector)
Docker and Kubernetes fundamentals
Experience with at least one LLM API (OpenAI, Anthropic, or Google)

The 90-day plan

Daily study recommendation: 2-3 hours/day, six days a week. Consistency beats intensity — block the time in your calendar like a meeting.

Days 1–15: Foundation

LLM application architecture patterns
Prompt engineering: zero-shot, few-shot, chain-of-thought
Prompt versioning with PromptLayer or LangChain Hub
Token budgeting and cost modeling
LLM provider comparison: latency, cost, capability trade-offs

Outcome: Design a production LLM application with versioned prompts and a token cost budget.

Days 16–30: Core concepts

RAG architecture: chunking, embedding, retrieval, reranking
Vector databases: Pinecone, Weaviate, Chroma, pgvector
Embedding models: OpenAI, Cohere, sentence-transformers
Hybrid search: dense + sparse (BM25) retrieval
RAG pipeline orchestration with LangChain or LlamaIndex

Outcome: Build a production-quality RAG pipeline with hybrid retrieval and reranking.

Days 31–45: Tools and workflows

LLM evaluation frameworks: Ragas, LangSmith, DeepEval
Evaluation metrics: faithfulness, answer relevancy, context recall
Human-in-the-loop evaluation workflows
Regression testing for LLM outputs
Benchmark datasets: MMLU, HumanEval, TruthfulQA

Outcome: Implement an automated LLM evaluation pipeline with regression testing on every prompt change.

Days 46–60: Hands-on projects

Guardrails: output validation with Guardrails AI and NVIDIA NeMo Guardrails
PII detection and redaction in LLM pipelines
Toxicity and hallucination detection
Rate limiting and abuse prevention for LLM APIs
Content safety policies and refusal handling

Outcome: Deploy safety guardrails that detect hallucinations, PII leakage, and policy violations in real-time.

Days 61–75: Advanced practices

Fine-tuning pipelines: LoRA/QLoRA on open-source models
Fine-tuning infrastructure: Axolotl, Hugging Face TRL
LLM serving: vLLM, TGI (Text Generation Inference), Ollama
Observability: LangSmith, Langfuse, Arize Phoenix tracing
Token cost optimization: caching, batching, prompt compression

Outcome: Fine-tune an open-source LLM with LoRA and serve it at production scale with full observability.

Days 76–90: Portfolio, interview & certification prep

LLMOps portfolio project: end-to-end RAG application
Preparing for Hugging Face NLP course certification
LLMOps interview questions and system design scenarios
Metrics: eval scores, p50/p95 latency, cost-per-query, error rates
Emerging: agentic AI ops, multi-agent orchestration, model routing

Outcome: Ship a portfolio RAG application with evals, guardrails, and observability and ace LLMOps interviews.

Weekly outcomes at a glance

Phase	Outcome
Days 1–15	Design a production LLM application with versioned prompts and a token cost budget.
Days 16–30	Build a production-quality RAG pipeline with hybrid retrieval and reranking.
Days 31–45	Implement an automated LLM evaluation pipeline with regression testing on every prompt change.
Days 46–60	Deploy safety guardrails that detect hallucinations, PII leakage, and policy violations in real-time.
Days 61–75	Fine-tune an open-source LLM with LoRA and serve it at production scale with full observability.
Days 76–90	Ship a portfolio RAG application with evals, guardrails, and observability and ace LLMOps interviews.

Tools to learn

LangChain
LlamaIndex
Pinecone
Weaviate
LangSmith
Ragas
Guardrails AI
vLLM
Hugging Face TRL
Langfuse
PromptLayer
Arize Phoenix

Labs to practice

Mini projects

Build a production RAG pipeline with Pinecone + LangChain, Ragas evaluation, and Guardrails AI safety checks
Fine-tune a Mistral 7B model with QLoRA using Axolotl, serve it with vLLM, and benchmark against GPT-4o
Create an LLMOps observability stack with Langfuse tracing, cost dashboards, and automated eval regression tests

Interview questions to prepare

What is RAG and when would you use it instead of fine-tuning?
How do you evaluate the quality of a RAG pipeline?
Explain the difference between faithfulness and relevancy in LLM evaluation.
What is LoRA and why is it preferred over full fine-tuning in most cases?
How do you prevent hallucination in production LLM applications?
What strategies do you use to reduce token costs in a high-volume LLM API application?
How would you design a prompt versioning system for a team of 10 engineers?
What observability metrics would you track for a production LLM service?

Certification suggestions

Hugging Face NLP Course Certificate — Hugging Face
Databricks Generative AI Fundamentals — Databricks
AWS Certified Machine Learning Specialty — AWS
DeepLearning.AI LLMOps Specialization — DeepLearning.AI / Coursera

Browse the full certification registry for exam details and official links.

Free resources

// instructor-led option

Prefer live, guided training with mentors and certification support? DevOpsSchool.com runs paid instructor-led programs that pair well with this free path.

Explore paid training on DevOpsSchool.com ↗