There are many good reasons to run data science workloads on a High Performance Computing (HPC) system. However, the transition from a laptop to an HPC system can be daunting. This training will help you make that transition.
You will also learn about potential pitfalls and how to avoid them. This training is not just about the good parts, but also about how to avoid the bad parts.
Learning outcomes
When you complete this training you will
- be able to judge when to switch to an HPC environment;
- be able to prepare your environment for R and Python;
- be able to run a job on an HPC cluster that uses that environment;
- be able to determine how long your computation will take;
- be able to determine how much memory your computation will need;
- be able to estimate the efficiency of your computation;
- know the basics of how to run your computations efficiently;
- know when it makes sense to use parallelization;
- understand the basics and pitfalls of I/O on HPC systems;
- are aware of potential pitfalls and how to avoid them.
Schedule
Total duration: 4 hours
Subject | Duration |
---|---|
introduction and motivation | 5 min. |
setting up environments on an HPC system | 25 min. |
walltime & memory requirements | 30 min. |
efficiency | 30 min. |
to parallelize or not to parallelize? | 30 min. |
I/O on HPC systems | 60 min. |
pitfalls and how to avoid them | 30 min. |
wrap up | 5 min. |
Training materials
All training materials are available in a GitHub repository.
Target audience
This training is for you if you need to use R on HPC systems.
Prerequisites
This is not a training that starts from scratch. You have followed an HPC introduction training session and you have a basic understanding of how to work on the Bash command line.
You have experience with R or Python.
Trainer(s)
- Geert Jan Bex (geertjan.bex@uhasselt.be)