Abstract


Benefits

  • Self-healing
  • automatic rollbacks
  • horizontal scaling

Attention

  • Can be complex to maintain
  • Costs associated with running nodes

Managed control planes can help mitigate complexity.

Sandbox to play with k8s

Play with Kubernetes provides you with Linux machines that have k8s preinstalled.

Control Plane


  • Runs on multiple nodes across data center zones for high availability

Key Components

Controller Manager

  • Replication Controller: Maintains the desired number of worker nodes
  • Deployment Controller: Handles rollbacks and updates

Scheduler

  • Schedules pods onto worker nodes, making placement decisions

etcd

  • Distributed key-value store
  • Stores cluster state, available resources, and health information
  • Used by other control plane components

API Server

Worker nodes


  • Run containers, which are encapsulated within pods
  • Pods are the smallest deployable units in Kubernetes
  • Pods provide shared storage and networking for containers

Key Components

Kubelet

  • Communicates with the control plane
  • Ensures the desired state of pods is maintained

Container Runtime

  • Can be Docker or another compatible runtime
  • Runs containers on worker nodes
  • Pulls images, starts/stops containers

Kube-proxy

  • Routes traffic to the correct pods
  • Handles load balancing
  • Cluster networking ensures that pods on different nodes can communicate seamlessly, so traffic can be routed between nodes without issue.

Containerization Workflow


  1. Kubelet (node agent) receives Pod spec
  2. It talks to the CRI runtime (containerd, CRI-O)
  3. Kubelet asks the runtime to: create containers and create the Pod-level cgroup
  4. The containers inside the Pod share: the Pod cgroup, namespaces (some shared, some isolated)
  5. Kubernetes writes Pod cpuLimits, memoryLimits, etc. into cgroup controllers
  6. Kernel enforces those resource restrictions dynamically

GPU Scheduling


  • By default, Kubernetes does not know GPUs exist on a node. The NVIDIA Device Plugin is a DaemonSet that runs on every GPU node and registers nvidia.com/gpu as a schedulable resource
# Pod spec requesting a GPU
resources:
  limits:
    nvidia.com/gpu: 1
  • On EKS, GPU-enabled node groups use instances with NVIDIA GPUs (p4d, p5, g5, g6) and the EKS-optimized GPU AMI which comes with NVIDIA drivers pre-installed

How it works

The flow is: GPU instance (hardware) → NVIDIA drivers (in AMI) → NVIDIA Device Plugin (DaemonSet, exposes GPUs to k8s scheduler) → Pod requests nvidia.com/gpu in resource limits.

Without the Device Plugin, the GPU hardware is physically present on the node but invisible to the Kubernetes scheduler. No pod can request or use it.

Cost optimization

Use Karpenter to auto-provision GPU nodes only when pods need them and scale to zero when idle. GPU instances are expensive, so avoiding idle nodes is critical.

References