Kubernetes Guide

Contents

Overview

Kubernetes is being adopted into HPC clusters to orchestrate deployments (e.g. software, infrastructure) and run certain workloads (e.g. AI/ML inference). There is ongoing interest in integrating Kubernetes and Slurm to achieve a unified cluster, optimized resource utilization, and workflows that leverage each system.

The ways in which Slurm and Kubernetes are designed to handle certain types of workloads may change over time. Additionally, how they interact with each other may change, allowing for new possibilities. This is still an evolving area.

SUNK — Slurm on Kubernetes

SUNK ("Slurm on Kubernetes") is an effort under development with CoreWeave that provides a converged Slurm and Kubernetes environment.

Their recent blog post, Introducing SUNK: A Slurm on Kubernetes Implementation for HPC and Large Scale AI, provides an overview of the architecture and use cases.

SUNK is expected to be open-sourced in early 2024.

Presentations

Note that older presentations may contain outdated information.

Presentations from 2023

Last modified 11 December 2023