View on GitHub

MLOps-on-HPC

Training material for a session on MLOps on HPC infrastructure.

Machine learning is becoming increasingly important in many fields, and topics such as reproducibility, data management, and data visualization are essential for any machine learning project. This training will teach you how to use MLOps frameworks to manage your machine learning projects, how to handle data and workflows.

Learning outcomes

When you complete this training you will

Schedule

Total duration: 4 hours.

Subject Duration
introduction and motivation 5 min.
setting up a git repository for ML 15 min.
versioning data using DVC 30 min.
defining a workflow with DVC 60 min.
comparing experiments using git & DVC 60 min.
tracking experiments using DVC Live 60 min.
wrap up 10 min.

Training materials

Slides are available in the GitHub repository, as well as example code and hands-on material.

Target audience

This training is for you if you need to manage machine learning workflows on HPC systems.

Prerequisites

You will need experience running machine learning workloads in Python or R. You will also need to be comfortable on the command line, and have some experience using the git version control system.

Level

Trainer(s)