// glossary

Concepts, decoded

Every term explained in plain English — what it is, why it matters, a real-world example, the tools around it, and the interview questions it shows up in.

CI/CD

Continuous Integration and Continuous Delivery/Deployment: the practice of automatically building, testing, and releasing every code change through a pipeline, so software moves from commit to production quickly, safely, and repeatably.

define
CloudOps

The discipline of operating workloads in public cloud environments: provisioning, monitoring, securing, scaling, and cost-managing cloud infrastructure, usually through automation and infrastructure as code.

define
DevOps

A culture and set of practices that unites software development and IT operations, using automation, CI/CD, and shared ownership to ship reliable software faster and shorten feedback loops between code and production.

define
DevSecOps

The practice of integrating security into every stage of the DevOps lifecycle, shifting security left with automated scanning, policy as code, and shared responsibility instead of a final security gate before release.

define
Distributed Tracing

A technique that follows a single request as it travels through multiple services, recording each hop as a timed span, so engineers can see exactly where latency and errors occur in a distributed system.

define
Error Budget

The amount of unreliability a service is allowed within its SLO period. If the SLO is 99.9% availability, the error budget is the remaining 0.1%; teams can spend it on risky changes and must slow down when it runs out.

define
FinOps

The practice of bringing financial accountability to cloud spending: giving engineering teams visibility into costs, optimizing usage through rightsizing and commitments, and aligning cloud investment with business value.

define
GitOps

An operating model where the desired state of infrastructure and applications lives in Git, and automated controllers continuously reconcile the live system to match it. Deployments and rollbacks become pull requests.

define
Incident Management

The structured process for detecting, responding to, resolving, and learning from service disruptions — covering on-call, alerting, severity levels, coordinated response roles, and blameless postmortems.

define
Infrastructure as Code (IaC)

The practice of defining servers, networks, and cloud resources in machine-readable code instead of manual clicks, so infrastructure can be versioned, reviewed, tested, and recreated identically on demand.

define
Internal Developer Platform (IDP)

The self-service product a platform team builds for its developers: a portal, APIs, and golden-path templates that abstract infrastructure so developers can create services, environments, and deployments on demand.

define
LLMOps

The operational discipline for building and running applications powered by large language models: prompt management, evaluation, guardrails, cost and latency control, and monitoring of non-deterministic outputs.

define
MLOps

The practice of applying DevOps principles to machine learning: versioning data and models, automating training and deployment pipelines, and monitoring models in production for drift and degradation.

define
Observability

The ability to understand a system's internal state from its external outputs — metrics, logs, and traces — so you can debug novel problems and ask new questions without shipping new code first.

define
Platform Engineering

The discipline of building internal platforms — golden paths, self-service tooling, and paved infrastructure — that let product teams ship software without each team reinventing CI/CD, Kubernetes, and observability themselves.

define
PromptOps

The discipline of treating prompts as production artifacts: versioning them in source control, testing changes against evaluation suites, deploying with rollout controls, and monitoring prompt performance over time.

define
RAGOps

The operational practice of running Retrieval-Augmented Generation systems in production: managing ingestion and embedding pipelines, vector stores, retrieval quality, and end-to-end evaluation of grounded LLM answers.

define
ReleaseOps

The discipline of managing how software reaches users: release planning and trains, progressive delivery with canary and blue-green strategies, feature flags, rollback readiness, and coordination across teams and environments.

define
SecOps

The practice of integrating security operations with IT operations: continuous threat monitoring, detection, and incident response, typically centered on a SOC, SIEM tooling, vulnerability management, and threat intelligence.

define
SLA (Service Level Agreement)

A formal contract between a provider and its customers that defines promised service levels, such as 99.9% uptime, along with consequences like service credits or penalties if the promise is broken.

define
SLI (Service Level Indicator)

A quantitative measurement of some aspect of a service's behavior that users care about, such as request success rate, latency, or freshness. SLIs are the raw metrics on which SLOs and error budgets are built.

define
SLO (Service Level Objective)

A target value for a service level indicator over a period, such as '99.9% of requests succeed over 30 days.' SLOs define how reliable a service should be and drive error budgets, alerting, and engineering priorities.

define
SRE (Site Reliability Engineering)

An engineering discipline, pioneered at Google, that applies software engineering to operations problems, using SLOs, error budgets, and automation to keep services reliable while still enabling fast change.

define
TestOps

The discipline of operating testing at scale: managing test infrastructure, integrating automated suites into CI/CD, controlling flaky tests, provisioning test data and environments, and using quality analytics to guide releases.

define

Concepts, decoded

CI/CD

CloudOps

DevOps

DevSecOps

Distributed Tracing

Error Budget

FinOps

GitOps

Incident Management

Infrastructure as Code (IaC)

Internal Developer Platform (IDP)

LLMOps

MLOps

Observability

Platform Engineering

PromptOps

RAGOps

ReleaseOps

SecOps

SLA (Service Level Agreement)

SLI (Service Level Indicator)

SLO (Service Level Objective)

SRE (Site Reliability Engineering)

TestOps