Skip to content

Data & AI 90 days 2-3 hours/day updated 2026-06-01

LLMOps 90-Day Learning Path

Master LLMOps in 90 days: LLM evals and benchmarks, RAG pipeline architecture, guardrails, prompt versioning, token cost optimization, and fine-tuning ops for production AI applications.

What LLMOps means

LLMOps is the operational discipline for deploying and maintaining large language model applications in production. It extends MLOps with LLM-specific concerns: prompt engineering and versioning, retrieval-augmented generation (RAG) pipeline design, evaluation frameworks, output guardrails, token cost management, and fine-tuning pipelines. LLMOps practitioners ensure that LLM applications are reliable, cost-efficient, safe, and continuously improving.

Who should follow this path

  • ML engineers deploying LLM-powered applications
  • AI engineers building RAG pipelines and agent systems
  • Platform engineers supporting LLM inference infrastructure
  • Data scientists moving from model training to LLM fine-tuning
  • Software engineers integrating LLM APIs into products

Prerequisites

  • Python proficiency and API integration experience
  • Basic understanding of transformer architecture and attention
  • Familiarity with vector databases (Pinecone, Weaviate, or pgvector)
  • Docker and Kubernetes fundamentals
  • Experience with at least one LLM API (OpenAI, Anthropic, or Google)

The 90-day plan

Daily study recommendation: 2-3 hours/day, six days a week. Consistency beats intensity — block the time in your calendar like a meeting.

Days 1–15: Foundation

  • LLM application architecture patterns
  • Prompt engineering: zero-shot, few-shot, chain-of-thought
  • Prompt versioning with PromptLayer or LangChain Hub
  • Token budgeting and cost modeling
  • LLM provider comparison: latency, cost, capability trade-offs

Outcome: Design a production LLM application with versioned prompts and a token cost budget.

Days 16–30: Core concepts

  • RAG architecture: chunking, embedding, retrieval, reranking
  • Vector databases: Pinecone, Weaviate, Chroma, pgvector
  • Embedding models: OpenAI, Cohere, sentence-transformers
  • Hybrid search: dense + sparse (BM25) retrieval
  • RAG pipeline orchestration with LangChain or LlamaIndex

Outcome: Build a production-quality RAG pipeline with hybrid retrieval and reranking.

Days 31–45: Tools and workflows

  • LLM evaluation frameworks: Ragas, LangSmith, DeepEval
  • Evaluation metrics: faithfulness, answer relevancy, context recall
  • Human-in-the-loop evaluation workflows
  • Regression testing for LLM outputs
  • Benchmark datasets: MMLU, HumanEval, TruthfulQA

Outcome: Implement an automated LLM evaluation pipeline with regression testing on every prompt change.

Days 46–60: Hands-on projects

  • Guardrails: output validation with Guardrails AI and NVIDIA NeMo Guardrails
  • PII detection and redaction in LLM pipelines
  • Toxicity and hallucination detection
  • Rate limiting and abuse prevention for LLM APIs
  • Content safety policies and refusal handling

Outcome: Deploy safety guardrails that detect hallucinations, PII leakage, and policy violations in real-time.

Days 61–75: Advanced practices

  • Fine-tuning pipelines: LoRA/QLoRA on open-source models
  • Fine-tuning infrastructure: Axolotl, Hugging Face TRL
  • LLM serving: vLLM, TGI (Text Generation Inference), Ollama
  • Observability: LangSmith, Langfuse, Arize Phoenix tracing
  • Token cost optimization: caching, batching, prompt compression

Outcome: Fine-tune an open-source LLM with LoRA and serve it at production scale with full observability.

Days 76–90: Portfolio, interview & certification prep

  • LLMOps portfolio project: end-to-end RAG application
  • Preparing for Hugging Face NLP course certification
  • LLMOps interview questions and system design scenarios
  • Metrics: eval scores, p50/p95 latency, cost-per-query, error rates
  • Emerging: agentic AI ops, multi-agent orchestration, model routing

Outcome: Ship a portfolio RAG application with evals, guardrails, and observability and ace LLMOps interviews.

Weekly outcomes at a glance

PhaseOutcome
Days 1–15Design a production LLM application with versioned prompts and a token cost budget.
Days 16–30Build a production-quality RAG pipeline with hybrid retrieval and reranking.
Days 31–45Implement an automated LLM evaluation pipeline with regression testing on every prompt change.
Days 46–60Deploy safety guardrails that detect hallucinations, PII leakage, and policy violations in real-time.
Days 61–75Fine-tune an open-source LLM with LoRA and serve it at production scale with full observability.
Days 76–90Ship a portfolio RAG application with evals, guardrails, and observability and ace LLMOps interviews.

Tools to learn

  • LangChain
  • LlamaIndex
  • Pinecone
  • Weaviate
  • LangSmith
  • Ragas
  • Guardrails AI
  • vLLM
  • Hugging Face TRL
  • Langfuse
  • PromptLayer
  • Arize Phoenix

Labs to practice

Mini projects

  • Build a production RAG pipeline with Pinecone + LangChain, Ragas evaluation, and Guardrails AI safety checks
  • Fine-tune a Mistral 7B model with QLoRA using Axolotl, serve it with vLLM, and benchmark against GPT-4o
  • Create an LLMOps observability stack with Langfuse tracing, cost dashboards, and automated eval regression tests

Interview questions to prepare

  1. What is RAG and when would you use it instead of fine-tuning?
  2. How do you evaluate the quality of a RAG pipeline?
  3. Explain the difference between faithfulness and relevancy in LLM evaluation.
  4. What is LoRA and why is it preferred over full fine-tuning in most cases?
  5. How do you prevent hallucination in production LLM applications?
  6. What strategies do you use to reduce token costs in a high-volume LLM API application?
  7. How would you design a prompt versioning system for a team of 10 engineers?
  8. What observability metrics would you track for a production LLM service?

Certification suggestions

  • Hugging Face NLP Course Certificate — Hugging Face
  • Databricks Generative AI Fundamentals — Databricks
  • AWS Certified Machine Learning Specialty — AWS
  • DeepLearning.AI LLMOps Specialization — DeepLearning.AI / Coursera

Browse the full certification registry for exam details and official links.

Free resources

Prefer live, guided training with mentors and certification support? DevOpsSchool.com runs paid instructor-led programs that pair well with this free path.

Explore paid training on DevOpsSchool.com ↗