217 hands-on Kubernetes labs are live — from your first Pod to MLOps, GitOps & SRE.

Hey — It's Sandip Das 👋

The Kubernetes Labs catalog just grew to 217 hands-on labs — the most complete path from your first Pod to operating production-grade clusters, all inside LearnXops.

Just create cluster using LearnXops Desktop App and Get started 👍

Most learning platforms give you very limited videos and quizzes and very few labs with a terminal which disconnects most of the time. We give you a terminal and a problem to solve using your local system.

Every lab on LearnXops:

Runs inside your system's Docker environment
Has step-by-step tasks with hints, verification and solutions
Covers beginner to expert difficulty

🧩 Core Kubernetes (39 labs)

Workloads, objects, scheduling and the kubectl skills everything else builds on.

Deploy Your First Pod and Inspect It with kubectl — Beginner · 100 pts
Create a Deployment and Scale It Up and Down — Beginner · 100 pts
Expose a Deployment with a ClusterIP Service — Beginner · 100 pts
Expose an App Externally with a NodePort Service — Beginner · 100 pts
Organize Workloads with Namespaces and Contexts — Beginner · 100 pts
Roll Out and Roll Back a Deployment Update — Beginner · 100 pts
Inject Configuration with ConfigMaps — Beginner · 100 pts
Manage Sensitive Data with Kubernetes Secrets — Beginner · 100 pts
Label, Select, and Annotate Kubernetes Objects — Beginner · 100 pts
Run a One-Off Job and a Scheduled CronJob — Beginner · 100 pts
Share Data Between Containers with emptyDir Volumes — Beginner · 100 pts
Field Selectors and Advanced kubectl Output (jsonpath, custom-columns) — Beginner · 100 pts
Run Workloads with ReplicaSets (and Why Deployments Win) — Beginner · 100 pts
Init Containers: Order Pod Startup with Dependency Checks — Beginner · 100 pts
Multi-Container Pods with the Sidecar Pattern — Beginner · 100 pts
Self-Healing Pods with Liveness and Readiness Probes — Intermediate · 100 pts
Set Resource Requests and Limits (and Trigger an OOMKill) — Intermediate · 100 pts
Persist Data with PersistentVolumeClaims and StorageClasses — Intermediate · 100 pts
Run a Stateful Database with a StatefulSet and Headless Service — Intermediate · 100 pts
Schedule Pods with NodeSelectors, Affinity, and Taints/Tolerations — Intermediate · 100 pts
Lock Down a Pod with RBAC (ServiceAccount, Role, RoleBinding) — Intermediate · 100 pts
Restrict Pod-to-Pod Traffic with NetworkPolicies — Intermediate · 100 pts
Pod Lifecycle Hooks: postStart and preStop — Intermediate · 100 pts
Native Sidecar Containers with restartPolicy: Always — Intermediate · 100 pts
Graceful Shutdown with terminationGracePeriodSeconds — Intermediate · 100 pts
Downward API: Expose Pod Metadata to Containers — Intermediate · 100 pts
Ephemeral Debug Containers with kubectl debug — Intermediate · 100 pts
Manage Resources Declaratively with kubectl apply and Kustomize — Intermediate · 100 pts
Patch Resources with Strategic Merge and JSON Patch — Intermediate · 100 pts
TTL Controller: Auto-Clean Finished Jobs — Intermediate · 100 pts
Run a DaemonSet Across All Nodes — Advanced · 100 pts
Zero-Downtime Rolling Updates with maxSurge / maxUnavailable — Advanced · 100 pts
Static Pods Managed Directly by the Kubelet — Advanced · 100 pts
IndexedJob and Parallel Job Completions in Kubernetes — Advanced · 100 pts
Pod Priority and Preemption with PriorityClasses — Advanced · 100 pts
Server-Side Apply and Field Management in Kubernetes — Advanced · 100 pts
Mutate, Validate, and Generate Resources with Kyverno — Advanced · 100 pts
Spread Workloads with Pod Topology Spread Constraints — Advanced · 100 pts
Multi-Node Scheduling, Cordon, Drain, and Node Maintenance — Expert · 100 pts

⎈ Helm — Packaging & Releases (10 labs)

Charts, dependencies, hooks, and multi-environment release management.

Install and Manage an App with Helm Charts — Intermediate · 100 pts
Helm Chart Dependencies and Subcharts — Intermediate · 100 pts
Helm Release Lifecycle: Upgrade, Rollback, and History — Intermediate · 100 pts
Manage Multi-Environment Releases with Helm Values Files — Intermediate · 100 pts
Test Your Helm Chart with helm test Hooks — Intermediate · 100 pts
Author Your Own Helm Chart with Values and Templates — Advanced · 100 pts
Helm Hooks for Pre/Post Install Jobs — Advanced · 100 pts
Template Logic with Named Templates and _helpers.tpl — Advanced · 100 pts
Validate Chart Inputs with values.schema.json — Advanced · 100 pts
Package and Host a Helm Chart Repository — Advanced · 100 pts

🌐 Networking (17 labs)

Services, DNS, Ingress, the Gateway API, NetworkPolicies and service mesh.

Route HTTP Traffic with an Ingress Controller — Intermediate · 100 pts
Expose a Service of type LoadBalancer (cloud-provider-kind) — Intermediate · 100 pts
Headless Services and DNS-Based Service Discovery — Intermediate · 100 pts
Cluster DNS Deep Dive: CoreDNS, FQDN Resolution, and DNS Debugging — Intermediate · 100 pts
ExternalName Services: Map In-Cluster Names to External Endpoints — Intermediate · 100 pts
Path- and Host-Based Routing with a Single Ingress — Intermediate · 100 pts
Expose Multiple Ingress Hosts with the NGINX Ingress Controller — Intermediate · 100 pts
Bare-Metal LoadBalancer with MetalLB (L2 Mode) — Intermediate · 100 pts
TLS Termination at the Ingress with Kubernetes Secrets — Advanced · 100 pts
EndpointSlices: How Services Track Pod Endpoints — Advanced · 100 pts
The Gateway API: Routes, Gateways, and GatewayClasses — Advanced · 100 pts
Default-Deny NetworkPolicies and Egress Control — Advanced · 100 pts
Zero-Trust mTLS Between Services with Linkerd — Advanced · 100 pts
Dual-Stack Networking (IPv4/IPv6) on kind — Expert · 100 pts
Traffic Management with Istio (Sidecar, VirtualService, DestinationRule) — Expert · 100 pts
Progressive Traffic Shifting and Mirroring with Istio — Expert · 100 pts
eBPF Networking and Flow Observability with Cilium + Hubble — Expert · 100 pts

💾 Storage & Data (11 labs)

Volumes, PVCs, StatefulSet storage, snapshots, and backup/restore.

Mount ConfigMaps and Secrets as Volumes (with subPath) — Beginner · 100 pts
Dynamic Provisioning vs Static PersistentVolumes — Intermediate · 100 pts
Reclaim Policies: Retain, Delete, and Their Trade-offs — Intermediate · 100 pts
Projected Volumes: Combine Secrets, ConfigMaps, and Tokens — Intermediate · 100 pts
Resize a PersistentVolumeClaim Online — Intermediate · 100 pts
VolumeSnapshots: Back Up and Restore Stateful Data — Advanced · 100 pts
StatefulSet volumeClaimTemplates and Per-Pod Storage — Advanced · 100 pts
Local PersistentVolumes and Node-Bound Storage — Advanced · 100 pts
Back Up and Restore Cluster State with Velero and MinIO — Advanced · 100 pts
Scheduled Backups and a Disaster-Recovery Drill with Velero — Advanced · 100 pts
Back Up and Restore etcd with etcdctl Snapshots — Advanced · 100 pts

🔐 Security (24 labs)

RBAC, admission control, Pod Security, secrets, supply-chain and runtime defense.

ServiceAccount Tokens and Projected Token Volumes — Intermediate · 100 pts
RBAC ClusterRoles and ClusterRoleBindings Across Namespaces — Intermediate · 100 pts
Audit Permissions with kubectl auth can-i and Impersonation — Intermediate · 100 pts
Drop Linux Capabilities and Run as Non-Root — Intermediate · 100 pts
Image Pull Secrets for Private Registries — Intermediate · 100 pts
Benchmark Cluster Hardening with kube-bench (CIS) — Intermediate · 100 pts
Scan Images and Live Workloads for CVEs with Trivy — Intermediate · 100 pts
Harden Workloads with Pod Security Standards and SecurityContext — Advanced · 100 pts
Enforce Pod Security Admission at the Namespace Level — Advanced · 100 pts
Read-Only Root Filesystems and seccomp Profiles — Advanced · 100 pts
Aggregated ClusterRoles for Extensible Permissions — Advanced · 100 pts
Bound ServiceAccount Tokens for Workload Identity — Advanced · 100 pts
Issue and Auto-Renew TLS Certificates with cert-manager — Advanced · 100 pts
Sync External Secrets with the External Secrets Operator — Advanced · 100 pts
Inject Secrets at Runtime with HashiCorp Vault Agent — Advanced · 100 pts
Enforce Policy with OPA Gatekeeper Constraints — Advanced · 100 pts
Issue a Client Certificate and Build a kubeconfig for a New User — Advanced · 100 pts
Enable API Server Audit Logging and Trace an Action — Advanced · 100 pts
Sign Images with cosign and Enforce Signatures at Admission — Advanced · 100 pts
Detect Runtime Threats with Falco — Advanced · 100 pts
ValidatingAdmissionPolicy with CEL Expressions — Expert · 100 pts
Block Workloads with a Validating Admission Webhook — Expert · 100 pts
Encrypt Secrets at Rest with an EncryptionConfiguration — Expert · 100 pts
Sandbox Untrusted Workloads with RuntimeClass (gVisor) — Expert · 100 pts

📊 Observability (13 labs)

Logs, metrics, tracing, dashboards, alerting and SLOs.

Stream and Aggregate Pod Logs with kubectl — Beginner · 100 pts
Inspect Cluster State with Events and kubectl describe — Beginner · 100 pts
Install metrics-server and Read Pod/Node Metrics — Intermediate · 100 pts
Profile Resource Usage with the Metrics API — Intermediate · 100 pts
Startup Probes for Slow-Starting Applications — Intermediate · 100 pts
Deploy Prometheus and Scrape Application Metrics — Advanced · 100 pts
Visualize Metrics with Grafana Dashboards — Advanced · 100 pts
Instrument an App with Custom Prometheus Metrics — Advanced · 100 pts
Alerting Rules with Prometheus Alertmanager — Advanced · 100 pts
Structured Logging with Fluent Bit to a Sink — Advanced · 100 pts
Aggregate Logs with Loki and Grafana — Advanced · 100 pts
Define SLOs with Prometheus and Multi-Window Burn-Rate Alerts — Advanced · 100 pts
Distributed Tracing with OpenTelemetry and Jaeger — Expert · 100 pts

🤖 Operators & CRDs (15 labs)

Custom resources, controllers, the Operator SDK and real-world operators.

Build a Custom Resource and Reconcile It with an Operator (kubebuilder/CRD basics) — Advanced · 100 pts
Define a CustomResourceDefinition with Validation Schemas — Advanced · 100 pts
Add Printer Columns and Subresources to a CRD — Advanced · 100 pts
Finalizers: Clean Up External Resources on Delete — Advanced · 100 pts
Install an Operator from OperatorHub with OLM — Advanced · 100 pts
Run a Private Image Registry with Harbor — Advanced · 100 pts
Isolate Tenants with Virtual Clusters (vcluster) — Advanced · 100 pts
Attribute Kubernetes Spend with OpenCost — Advanced · 100 pts
Run Production PostgreSQL with the CloudNativePG Operator — Advanced · 100 pts
Scaffold an Operator with the Operator SDK — Expert · 100 pts
Build a Controller with the client-go Informer Pattern — Expert · 100 pts
CRD Versioning and Conversion Webhooks — Expert · 100 pts
Implement a Reconcile Loop with Owner References — Expert · 100 pts
Run Virtual Machines on Kubernetes with KubeVirt — Expert · 100 pts
Platform Engineering with Crossplane — Expert · 100 pts

🔄 GitOps & Delivery (12 labs)

Argo CD, Flux, progressive delivery, in-cluster CI and developer inner-loop.

Drift Detection and Reconciliation Fundamentals — Intermediate · 100 pts
Fast Inner-Loop Development with Skaffold — Intermediate · 100 pts
Catch Deprecated and Invalid Manifests Before Apply (kubeconform + Pluto) — Intermediate · 100 pts
GitOps Deploy with Argo CD on kind — Advanced · 100 pts
Argo CD Auto-Sync and Self-Heal — Advanced · 100 pts
Argo CD App-of-Apps Pattern — Advanced · 100 pts
Manage Clusters Declaratively with Flux CD — Advanced · 100 pts
Flux Helm Releases from Git — Advanced · 100 pts
Multi-Environment Promotion with Kustomize Overlays in GitOps — Advanced · 100 pts
Build Container Images In-Cluster with Kaniko — Advanced · 100 pts
Run a CI Pipeline on Kubernetes with Tekton — Advanced · 100 pts
Progressive Delivery with Argo Rollouts (Canary) — Expert · 100 pts

📈 Scaling & Capacity (13 labs)

HPA, VPA, KEDA, quotas, autoscaling and right-sizing.

Resource Quotas and LimitRanges per Namespace — Intermediate · 100 pts
Protect Availability with PodDisruptionBudgets during a Drain — Intermediate · 100 pts
Autoscale Workloads with the Horizontal Pod Autoscaler — Advanced · 100 pts
Blue-Green and Canary Deployments by Hand — Advanced · 100 pts
HPA with Memory and Multiple Metrics — Advanced · 100 pts
Tune HPA Behavior with scaleUp/scaleDown Policies — Advanced · 100 pts
Vertical Pod Autoscaler: Right-Size Your Workloads — Advanced · 100 pts
Cluster Autoscaling Concepts on kind — Advanced · 100 pts
Custom Metrics Autoscaling with the Prometheus Adapter — Advanced · 100 pts
Resize Pod CPU/Memory In-Place Without a Restart — Advanced · 100 pts
KEDA: Event-Driven Autoscaling from a Queue — Expert · 100 pts
Rebalance Workloads with the Descheduler — Expert · 100 pts
Dynamic Resource Allocation (DRA): Claim Devices with ResourceClaims — Expert · 100 pts

🔧 Troubleshooting & SRE (15 labs)

Diagnose and recover real failures — crashes, scheduling, DNS, nodes, chaos.

Resolve ImagePullBackOff and ErrImagePull — Beginner · 100 pts
Diagnose Pending Pods: Scheduling and Insufficient Resources — Intermediate · 100 pts
Fix CrashLoopBackOff: Logs, Exit Codes, and Probes — Intermediate · 100 pts
Debug Service Connectivity: Endpoints, Selectors, and DNS — Intermediate · 100 pts
Debug RBAC Forbidden Errors — Intermediate · 100 pts
Investigate OOMKilled and Evicted Pods — Intermediate · 100 pts
Network Debugging with a netshoot Toolbox Pod — Intermediate · 100 pts
QoS Classes and the Eviction Order Under Node Pressure — Intermediate · 100 pts
Debug a Broken Cluster: CrashLoopBackOff, ImagePullBackOff, and Pending Pods — Advanced · 100 pts
Troubleshoot Failed PVC Binding and Volume Mounts — Advanced · 100 pts
Recover a Stuck Namespace Termination (Finalizers) — Advanced · 100 pts
Inject Faults and Test Resilience with Chaos Mesh — Advanced · 100 pts
Debug a Failing Admission Webhook Blocking All Deployments — Advanced · 100 pts
Trace Node NotReady and Kubelet Issues — Expert · 100 pts
Game Day: Diagnose a Multi-Failure Cluster — Expert · 100 pts

🧠 AI / ML / MLOps on Kubernetes (48 labs)

Serving, training, GPUs, pipelines, feature stores, drift, notebooks and cost.

Serve a scikit-learn Model as a REST API on Kubernetes — Beginner · 100 pts
Run a Training Job to Completion and Persist the Model — Beginner · 100 pts
Deploy a FastAPI ML Inference Service and Load-Test It — Intermediate · 100 pts
Install KServe and Serve a Model with an InferenceService — Intermediate · 100 pts
Batch (Offline) Inference as a Parallel Kubernetes Job — Intermediate · 100 pts
Simulate GPU Scheduling on kind with a Fake GPU Device Plugin — Intermediate · 100 pts
Schedule Periodic Retraining with an Argo CronWorkflow — Intermediate · 100 pts
Use MinIO as S3-Compatible Storage for Datasets and Models — Intermediate · 100 pts
Run a Vector Database (Qdrant) on Kubernetes — Intermediate · 100 pts
Log and Audit Model Predictions in Kubernetes — Intermediate · 100 pts
Load-Test and Benchmark an Inference Service — Intermediate · 100 pts
Version Datasets for Reproducible Training — Intermediate · 100 pts
Provision Per-User Notebook Servers with Resource Limits — Intermediate · 100 pts
Cache LLM/Embedding Responses to Cut Inference Cost — Intermediate · 100 pts
KServe Canary Rollout Between Two Model Versions — Advanced · 100 pts
Serve Multiple Models with Seldon Core — Advanced · 100 pts
Scale Inference to Zero with KServe + Knative — Advanced · 100 pts
Triton Inference Server: Multi-Framework Model Serving on Kubernetes — Advanced · 100 pts
Serve a Small LLM on CPU with llama.cpp (GGUF) — Advanced · 100 pts
Ray Serve for Online Model Inference on Kubernetes — Advanced · 100 pts
Model Inference Gateway and Request Routing — Advanced · 100 pts
Gang-Schedule ML Jobs with the Volcano Scheduler — Advanced · 100 pts
Checkpoint and Resume Training Across Pod Restarts — Advanced · 100 pts
Queue and Fair-Share ML Batch Jobs with Kueue — Advanced · 100 pts
Distributed Data Processing with a Ray Cluster (KubeRay) — Advanced · 100 pts
Schedule GPU Workloads with the NVIDIA Device Plugin — Advanced · 100 pts
Monitor GPU Utilization with DCGM Exporter and Grafana — Advanced · 100 pts
Enforce GPU Quotas and Fair Use per Namespace — Advanced · 100 pts
Build and Run an ML Pipeline with Argo Workflows — Advanced · 100 pts
Event-Driven ML Workflows with Argo Events — Advanced · 100 pts
Serve Online Features with a Feature Store (Feast) on Kubernetes — Advanced · 100 pts
Track Experiments and Models with MLflow on Kubernetes — Advanced · 100 pts
Expose Inference Metrics (Latency, Throughput, Tokens) to Prometheus — Advanced · 100 pts
Run JupyterHub on Kubernetes for Team Notebooks — Advanced · 100 pts
Multi-Tenant Model Serving with Namespaces and Quotas — Advanced · 100 pts
Blue-Green Model Deployment with Zero-Downtime Cutover — Advanced · 100 pts
Rate-Limit and Add Guardrails to an LLM Endpoint — Advanced · 100 pts
Warm-Pool Inference Pods to Reduce Cold Starts — Advanced · 100 pts
Autoscale LLM Inference from a Queue with KEDA — Expert · 100 pts
Distributed PyTorch Training with the Training Operator (PyTorchJob) — Expert · 100 pts
Distributed TensorFlow with TFJob on Kubernetes — Expert · 100 pts
Hyperparameter Tuning with Katib on Kubernetes — Expert · 100 pts
All-Reduce Training with MPIJob and Horovod — Expert · 100 pts
Fine-Tune a Small Language Model as a Job (LoRA, CPU) — Expert · 100 pts
Share GPUs with Time-Slicing via the NVIDIA GPU Operator — Expert · 100 pts
Kubeflow Pipelines: Compile and Run a Pipeline — Expert · 100 pts
Detect Data and Model Drift on Kubernetes — Expert · 100 pts
Cost-Aware Bin-Packing for Inference Workloads — Expert · 100 pts

Get Started

Browse all 217 labs →

Pick a category you're working on, start a lab, and complete it. It takes anywhere from 15 minutes to an hour depending on difficulty.

🏆 Checkout The Leader Board Here