Machine learning is becoming increasingly important in many fields, and topics such as reproducibility, data management, and data visualization are essential for any machine learning project. This training will teach you how to use MLOps frameworks to manage your machine learning projects, how to handle data and workflows.
Learning outcomes
When you complete this training you will
- be able to use DVC to version your data;
- be able to define pipelines with DVC;
- be able to use DVC to manage your machine learning projects;
- be able to reproduce your machine learning experiments.
Schedule
Total duration: 4 hours.
Subject | Duration |
---|---|
introduction and motivation | 5 min. |
setting up a git repository for ML | 15 min. |
versioning data using DVC | 30 min. |
defining a workflow with DVC | 60 min. |
comparing experiments using git & DVC | 60 min. |
tracking experiments using DVC Live | 60 min. |
wrap up | 10 min. |
Training materials
Slides are available in the GitHub repository, as well as example code and hands-on material.
Target audience
This training is for you if you need to manage machine learning workflows on HPC systems.
Prerequisites
You will need experience running machine learning workloads in Python or R. You will also need to be comfortable on the command line, and have some experience using the git version control system.
Level
- Introductory: 30 %
- Intermediate: 50 %
- Advanced: 20 %
Trainer(s)
- Geert Jan Bex (geertjan.bex@uhasselt.be)